Multi-scale vision transformer classification model with self-supervised learning and dilated convolution. (October 2022)
- Record Type:
- Journal Article
- Title:
- Multi-scale vision transformer classification model with self-supervised learning and dilated convolution. (October 2022)
- Main Title:
- Multi-scale vision transformer classification model with self-supervised learning and dilated convolution
- Authors:
- Xing, Liping
Jin, Hongmei
Li, Hong-an
Li, Zhanli - Abstract:
- Abstract: Benefiting from the advantages of good parallelism and features that support long-distance dependency modeling, a variety of ViT models based on the self-attention mechanism show outstanding performance in image classification tasks. However, these works have poor classification accuracy when trained on small datasets owing to insufficient attention toward local features. Therefore, this study presents a new self-supervised multi-scale ViT classification model, SMvT. This model adopts twin-tower architectures as the self-supervised framework and the hierarchical Swin Transformer as the backbone and proposes a MDP embedding layer to fully pay attention to local details. We investigated the model's performance pretrained using the lightweight ImageNet dataset. Compared with recent self-supervised Transformers for vision, such as MoCo v3 and DINO, the classification accuracy of the SMvT has improved by up to 15.2%. SMvT combines self-supervised learning and depthwise separable dilated convolution, which is a lightweight and high generalization ViT model supporting cross-scale attention modeling. Graphical abstract: Highlights: Constructed a self-supervised vision Transformer classification model. Proposed a multi-scale dilated pyramid (MDP) embedding layer. Designed a depthwise separable dilated convolution (DSD conv).
- Is Part Of:
- Computers & electrical engineering. Volume 103(2022)
- Journal:
- Computers & electrical engineering
- Issue:
- Volume 103(2022)
- Issue Display:
- Volume 103, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 103
- Issue:
- 2022
- Issue Sort Value:
- 2022-0103-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-10
- Subjects:
- Self-attention -- Vision transformer -- Multi-scale -- Self-supervised -- Depthwise separable dilated convolution -- Local features
Computer engineering -- Periodicals
Electrical engineering -- Periodicals
Electrical engineering -- Data processing -- Periodicals
Ordinateurs -- Conception et construction -- Périodiques
Électrotechnique -- Périodiques
Électrotechnique -- Informatique -- Périodiques
Computer engineering
Electrical engineering
Electrical engineering -- Data processing
Periodicals
Electronic journals
621.302854 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00457906/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compeleceng.2022.108270 ↗
- Languages:
- English
- ISSNs:
- 0045-7906
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.680000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 24061.xml