CMA: Cross-modal attention for 6D object pose estimation. (June 2021)
- Record Type:
- Journal Article
- Title:
- CMA: Cross-modal attention for 6D object pose estimation. (June 2021)
- Main Title:
- CMA: Cross-modal attention for 6D object pose estimation
- Authors:
- Zou, Lu
Huang, Zhangjin
Wang, Fangjun
Yang, Zhouwang
Wang, Guoping - Abstract:
- Highlights: We present CMA, a novel cross-modal data fusion approach that incorporates the attention mechanism. CMA extracts discriminative cross-modal features which are more robust to 6D object pose estimation. We evaluate our method on two widely used datasets: LINEMOD dataset and YCB-Video dataset. Experimental results demonstrate that our method achieves superior performance on both datasets over the state-of-the-art methods as well as high-efficiency. Graphical abstract: Abstract: Deep learning methods for 6D object pose estimation based on RGB and depth (RGB-D) images have been successfully applied to robotic manipulation and grasping. Among these approaches, the fusion of RGB and depth modalities is one of the most critical issues. Most existing works performed fusion via either simple concatenation, or element-wise multiplication of the features generated by these two modalities. Despite achieving impressive progress, such fusion strategies do not explicitly consider the different contributions of RGB and depth modalities, leaving a gap for performance enhancement. In this paper, we present a Cross-Modal Attention (CMA) component for the problem of 6D object pose estimation. With the attention mechanism, features of two different modalities are aggregated adaptively through the attention weights, such that powerful representations from the RGB-D images can be efficiently extracted. Comprehensive experiments on both LINEMOD and YCB-Video datasets demonstrate that theHighlights: We present CMA, a novel cross-modal data fusion approach that incorporates the attention mechanism. CMA extracts discriminative cross-modal features which are more robust to 6D object pose estimation. We evaluate our method on two widely used datasets: LINEMOD dataset and YCB-Video dataset. Experimental results demonstrate that our method achieves superior performance on both datasets over the state-of-the-art methods as well as high-efficiency. Graphical abstract: Abstract: Deep learning methods for 6D object pose estimation based on RGB and depth (RGB-D) images have been successfully applied to robotic manipulation and grasping. Among these approaches, the fusion of RGB and depth modalities is one of the most critical issues. Most existing works performed fusion via either simple concatenation, or element-wise multiplication of the features generated by these two modalities. Despite achieving impressive progress, such fusion strategies do not explicitly consider the different contributions of RGB and depth modalities, leaving a gap for performance enhancement. In this paper, we present a Cross-Modal Attention (CMA) component for the problem of 6D object pose estimation. With the attention mechanism, features of two different modalities are aggregated adaptively through the attention weights, such that powerful representations from the RGB-D images can be efficiently extracted. Comprehensive experiments on both LINEMOD and YCB-Video datasets demonstrate that the proposed approach achieves state-of-the-art performance. … (more)
- Is Part Of:
- Computers & graphics. Volume 97(2021)
- Journal:
- Computers & graphics
- Issue:
- Volume 97(2021)
- Issue Display:
- Volume 97, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 97
- Issue:
- 2021
- Issue Sort Value:
- 2021-0097-2021-0000
- Page Start:
- 139
- Page End:
- 147
- Publication Date:
- 2021-06
- Subjects:
- 6D object pose estimation -- Cross-modal data fusion -- Attention mechanism
Computer graphics -- Periodicals
006.6 - Journal URLs:
- http://www.elsevier.com/journals ↗
- DOI:
- 10.1016/j.cag.2021.04.018 ↗
- Languages:
- English
- ISSNs:
- 0097-8493
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.700000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 17245.xml