PVF-DectNet: Multi-modal 3D detection network based on Perspective-Voxel fusion. (April 2023)
- Record Type:
- Journal Article
- Title:
- PVF-DectNet: Multi-modal 3D detection network based on Perspective-Voxel fusion. (April 2023)
- Main Title:
- PVF-DectNet: Multi-modal 3D detection network based on Perspective-Voxel fusion
- Authors:
- Wang, Ke
Zhou, Tianqiang
Zhang, Zhichuang
Chen, Tao
Chen, Junlan - Abstract:
- Abstract: The detection of small objects such as pedestrians still poses challenges to the LiDAR-based 3D object detection due to the sparseness and disorder of point clouds. Conversely, images from cameras can provide rich semantic information, which makes these small-sized objects easy to be detected. To take use of the advantages of both devices to achieve better 3D object detection, research on the fusion of LiDAR and camera information is now being conducted. The existing fusion methods between point clouds and image are normally weighed more on the point clouds. Hence the semantic information of images is not fully utilized. We propose a new fusion method named PVFusion to try to fuse more image features. We first divide each point into a separate perspective voxel and project the voxel onto the image feature maps. Then the semantic feature of the perspective voxel is fused with the geometric feature of the point. A 3D object detection model (PVF-DectNet) is designed using PVFusion. During training we employ the ground truth paste (GT-Paste) data augmentation and solve the occlusion problem caused by newly added object. The KITTI validation set is used to validate the PVF-DectNet, which shows 3.6% AP improvement over the other feature fusion methods in pedestrian detection. On the KITTI test set, the PVF-DectNet outperforms the other multi-modal SOTA methods by 2.2% AP in pedestrian detection. And PVFusion shows better detection performance for sparse point clouds thanAbstract: The detection of small objects such as pedestrians still poses challenges to the LiDAR-based 3D object detection due to the sparseness and disorder of point clouds. Conversely, images from cameras can provide rich semantic information, which makes these small-sized objects easy to be detected. To take use of the advantages of both devices to achieve better 3D object detection, research on the fusion of LiDAR and camera information is now being conducted. The existing fusion methods between point clouds and image are normally weighed more on the point clouds. Hence the semantic information of images is not fully utilized. We propose a new fusion method named PVFusion to try to fuse more image features. We first divide each point into a separate perspective voxel and project the voxel onto the image feature maps. Then the semantic feature of the perspective voxel is fused with the geometric feature of the point. A 3D object detection model (PVF-DectNet) is designed using PVFusion. During training we employ the ground truth paste (GT-Paste) data augmentation and solve the occlusion problem caused by newly added object. The KITTI validation set is used to validate the PVF-DectNet, which shows 3.6% AP improvement over the other feature fusion methods in pedestrian detection. On the KITTI test set, the PVF-DectNet outperforms the other multi-modal SOTA methods by 2.2% AP in pedestrian detection. And PVFusion shows better detection performance for sparse point clouds than PointFusion in both car and pedestrian categories. As for 32 beams LiDAR scene, there are 4.2% AP increment in moderate difficulty car category and 5.2% mAP improvement in pedestrian category. Highlights: For the first time, the information loss of the existing feature fusion methods is pointed out. The proposed PVFusion model fuses more image features in 3D object detection with a LiDAR-camera. GT-Paste solves the occlusion problem during multi-modal data augmentation. The assembled PVF-DectNet model outperforms other multi-modal pedestrian detection methods. The PVFusion model shows better performance on reduced-beam LiDAR scenes. … (more)
- Is Part Of:
- Engineering applications of artificial intelligence. Volume 120(2023)
- Journal:
- Engineering applications of artificial intelligence
- Issue:
- Volume 120(2023)
- Issue Display:
- Volume 120, Issue 2023 (2023)
- Year:
- 2023
- Volume:
- 120
- Issue:
- 2023
- Issue Sort Value:
- 2023-0120-2023-0000
- Page Start:
- Page End:
- Publication Date:
- 2023-04
- Subjects:
- 3D object detection -- Sensor fusion -- Deep learning -- Small objects -- LiDAR-camera-based detector
Engineering -- Data processing -- Periodicals
Artificial intelligence -- Periodicals
Expert systems (Computer science) -- Periodicals
Ingénierie -- Informatique -- Périodiques
Intelligence artificielle -- Périodiques
Systèmes experts (Informatique) -- Périodiques
Artificial intelligence
Engineering -- Data processing
Expert systems (Computer science)
Periodicals
620.00285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09521976 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.engappai.2023.105951 ↗
- Languages:
- English
- ISSNs:
- 0952-1976
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3755.704500
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 26180.xml