Feature learning network with transformer for multi-label image classification. (April 2023)
- Record Type:
- Journal Article
- Title:
- Feature learning network with transformer for multi-label image classification. (April 2023)
- Main Title:
- Feature learning network with transformer for multi-label image classification
- Authors:
- Zhou, Wei
Dou, Peng
Su, Tao
Hu, Haifeng
Zheng, Zhijie - Abstract:
- Highlights: A novel framework termed FL-Tran is proposed to solve the multi-label image classification task. A multi-scale fusion mechanism is designed to align high-level features and low-level features to learn multi-scale features. A spatial attention mechanism based on transformer encoder is developed to capture the salient object features in images. A feature enhancement and suppression mechanism is proposed to excavate various potential useful features through stage-by-stage suppressing the most salient feature in the feature maps. Experiments on three publicly available datasets validate the superior performance of the proposed FL-Tran model compared with the state-of-the-art methods. Abstract: The purpose of multi-label image classification task is to accurately assign a set of labels to the objects in images. Although promising results have been achieved, most of the existing methods cannot effectively learn multi-scale features, so it is difficult to identify small-scale objects from images. Besides, current attention-based methods tend to learn the most salient feature regions in images, but fail to excavate various potential useful features concealed by the most salient feature, thus limiting the further improvement of model performance. To address above issues, we propose a novel Feature Learning network based on Transformer to learn salient features and excavate potential useful features (FL-Tran ). Specifically, in order to solve the problem that currentHighlights: A novel framework termed FL-Tran is proposed to solve the multi-label image classification task. A multi-scale fusion mechanism is designed to align high-level features and low-level features to learn multi-scale features. A spatial attention mechanism based on transformer encoder is developed to capture the salient object features in images. A feature enhancement and suppression mechanism is proposed to excavate various potential useful features through stage-by-stage suppressing the most salient feature in the feature maps. Experiments on three publicly available datasets validate the superior performance of the proposed FL-Tran model compared with the state-of-the-art methods. Abstract: The purpose of multi-label image classification task is to accurately assign a set of labels to the objects in images. Although promising results have been achieved, most of the existing methods cannot effectively learn multi-scale features, so it is difficult to identify small-scale objects from images. Besides, current attention-based methods tend to learn the most salient feature regions in images, but fail to excavate various potential useful features concealed by the most salient feature, thus limiting the further improvement of model performance. To address above issues, we propose a novel Feature Learning network based on Transformer to learn salient features and excavate potential useful features (FL-Tran ). Specifically, in order to solve the problem that current methods are difficult to identify small-scale objects, we first present a novel multi-scale fusion module (MSFM) to align high-level features and low-level features to learn multi-scale features. Additionally, a spatial attention module (SAM) utilizing transformer encoder is introduced to capture salient object features in images to enhance the model performance. Furthermore, we devise a feature enhancement and suppression module (FESM) with the aim of excavating potential useful features concealed by the most salient features. By suppressing the most salient features obtained in current SAM layer, and then forcing subsequent SAM layer to excavate potential salient features in feature maps, FL-Tran model can learn various useful features more comprehensively. Extensive experiments on MS-COCO 2014, PASCAL VOC 2007, and NUS-WIDE datasets demonstrate that our proposed FL-Tran model outperforms current state-of-the-art methods. … (more)
- Is Part Of:
- Pattern recognition. Volume 136(2023)
- Journal:
- Pattern recognition
- Issue:
- Volume 136(2023)
- Issue Display:
- Volume 136, Issue 2023 (2023)
- Year:
- 2023
- Volume:
- 136
- Issue:
- 2023
- Issue Sort Value:
- 2023-0136-2023-0000
- Page Start:
- Page End:
- Publication Date:
- 2023-04
- Subjects:
- Multi-label classification -- Transformer -- Multi-scale features -- Spatial attention -- Salient features -- Feature suppression
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2022.109203 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 25681.xml