Improve YOLOv3 using dilated spatial pyramid module for multi-scale object detection. (22nd July 2020)
- Record Type:
- Journal Article
- Title:
- Improve YOLOv3 using dilated spatial pyramid module for multi-scale object detection. (22nd July 2020)
- Main Title:
- Improve YOLOv3 using dilated spatial pyramid module for multi-scale object detection
- Authors:
- Zhang, Xiaoguo
Gao, Ye
Wang, Huiqing
Wang, Qing - Abstract:
- Effectively and efficiently recognizing multi-scale objects is one of the key challenges of utilizing deep convolutional neural network to the object detection field. YOLOv3 (You only look once v3) is the state-of-the-art object detector with good performance in both aspects of accuracy and speed; however, the scale variation is still the challenging problem which needs to be improved. Considering that the detection performances of multi-scale objects are related to the receptive fields of the network, in this work, we propose a novel dilated spatial pyramid module to integrate multi-scale information to effectively deal with scale variation problem. Firstly, the input of dilated spatial pyramid is fed into multiple parallel branches with different dilation rates to generate feature maps with different receptive fields. Then, the input of dilated spatial pyramid and outputs of different branches are concatenated to integrate multi-scale information. Moreover, dilated spatial pyramid is integrated with YOLOv3 in front of the first detection header to present dilated spatial pyramid-You only look once model. Experiment results on PASCAL VOC2007 demonstrate that dilated spatial pyramid-You only look once model outperforms other state-of-the-art methods in mean average precision, while it still keeps a satisfying real-time detection speed. For 416 × 416 input, dilated spatial pyramid-You only look once model achieves 82.2% mean average precision at 56 frames per second, 3.9%Effectively and efficiently recognizing multi-scale objects is one of the key challenges of utilizing deep convolutional neural network to the object detection field. YOLOv3 (You only look once v3) is the state-of-the-art object detector with good performance in both aspects of accuracy and speed; however, the scale variation is still the challenging problem which needs to be improved. Considering that the detection performances of multi-scale objects are related to the receptive fields of the network, in this work, we propose a novel dilated spatial pyramid module to integrate multi-scale information to effectively deal with scale variation problem. Firstly, the input of dilated spatial pyramid is fed into multiple parallel branches with different dilation rates to generate feature maps with different receptive fields. Then, the input of dilated spatial pyramid and outputs of different branches are concatenated to integrate multi-scale information. Moreover, dilated spatial pyramid is integrated with YOLOv3 in front of the first detection header to present dilated spatial pyramid-You only look once model. Experiment results on PASCAL VOC2007 demonstrate that dilated spatial pyramid-You only look once model outperforms other state-of-the-art methods in mean average precision, while it still keeps a satisfying real-time detection speed. For 416 × 416 input, dilated spatial pyramid-You only look once model achieves 82.2% mean average precision at 56 frames per second, 3.9% higher than YOLOv3 with only slight speed drops. … (more)
- Is Part Of:
- International journal of advanced robotic systems. Volume 17:Number 4(2020:Jul./Aug.)
- Journal:
- International journal of advanced robotic systems
- Issue:
- Volume 17:Number 4(2020:Jul./Aug.)
- Issue Display:
- Volume 17, Issue 4 (2020)
- Year:
- 2020
- Volume:
- 17
- Issue:
- 4
- Issue Sort Value:
- 2020-0017-0004-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-07-22
- Subjects:
- Real-time object detection -- YOLOv3 -- scale variation -- dilated spatial pyramid -- receptive fields
Robotics -- Periodicals
Robotics
Periodicals
629.892 - Journal URLs:
- http://arx.sagepub.com/ ↗
http://search.epnet.com/direct.asp?db=bch&jid=13CR&scope=site ↗
http://www.intechweb.org/journal.php?id=3 ↗
http://www.uk.sagepub.com/home.nav ↗ - DOI:
- 10.1177/1729881420936062 ↗
- Languages:
- English
- ISSNs:
- 1729-8806
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 13693.xml