SAM-GAN: Self-Attention supporting Multi-stage Generative Adversarial Networks for text-to-image synthesis. (June 2021)
- Record Type:
- Journal Article
- Title:
- SAM-GAN: Self-Attention supporting Multi-stage Generative Adversarial Networks for text-to-image synthesis. (June 2021)
- Main Title:
- SAM-GAN: Self-Attention supporting Multi-stage Generative Adversarial Networks for text-to-image synthesis
- Authors:
- Peng, Dunlu
Yang, Wuchen
Liu, Cong
Lü, Shuairui - Abstract:
- Abstract: Synthesizing photo-realistic images based on text descriptions is a challenging task in the field of computer vision. Although generative adversarial networks have made significant breakthroughs in this task, they still face huge challenges in generating high-quality visually realistic images consistent with the semantics of text. Generally, existing text-to-image methods accomplish this task with two steps, that is, first generating an initial image with a rough outline and color, and then gradually yielding the image within high-resolution from the initial image. However, one drawback of these methods is that, if the quality of the initial image generation is not high, it is hard to generate a satisfactory high-resolution image. In this paper, we propose SAM-GAN, Self-Attention supporting Multi-stage Generative Adversarial Networks, for text-to-image synthesis. With the self-attention mechanism, the model can establish the multi-level dependence of the image and fuse the sentence- and word-level visual-semantic vectors, to improve the quality of the generated image. Furthermore, a multi-stage perceptual loss is introduced to enhance the semantic similarity between the synthesized image and the real image, thus enhancing the visual-semantic consistency between text and images. For the diversity of the generated images, a mode seeking regularization term is integrated into the model. The results of extensive experiments and ablation studies, which were conducted inAbstract: Synthesizing photo-realistic images based on text descriptions is a challenging task in the field of computer vision. Although generative adversarial networks have made significant breakthroughs in this task, they still face huge challenges in generating high-quality visually realistic images consistent with the semantics of text. Generally, existing text-to-image methods accomplish this task with two steps, that is, first generating an initial image with a rough outline and color, and then gradually yielding the image within high-resolution from the initial image. However, one drawback of these methods is that, if the quality of the initial image generation is not high, it is hard to generate a satisfactory high-resolution image. In this paper, we propose SAM-GAN, Self-Attention supporting Multi-stage Generative Adversarial Networks, for text-to-image synthesis. With the self-attention mechanism, the model can establish the multi-level dependence of the image and fuse the sentence- and word-level visual-semantic vectors, to improve the quality of the generated image. Furthermore, a multi-stage perceptual loss is introduced to enhance the semantic similarity between the synthesized image and the real image, thus enhancing the visual-semantic consistency between text and images. For the diversity of the generated images, a mode seeking regularization term is integrated into the model. The results of extensive experiments and ablation studies, which were conducted in the Caltech-UCSD Birds and Microsoft Common Objects in Context datasets, show that our model is superior to competitive models in text-to-image synthesis. Highlights: SAM-GAN, a novel GAN-based model m, is proposed to improve the quality of generated images. A multi-stage perceptual loss module is designed to generate high quality images consistent with text description. A regularization is employed to improve the diversity of image backgrounds. Extensive experiments and ablation studies show our model performs well for text-to-image task. … (more)
- Is Part Of:
- Neural networks. Volume 138(2021)
- Journal:
- Neural networks
- Issue:
- Volume 138(2021)
- Issue Display:
- Volume 138, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 138
- Issue:
- 2021
- Issue Sort Value:
- 2021-0138-2021-0000
- Page Start:
- 57
- Page End:
- 67
- Publication Date:
- 2021-06
- Subjects:
- Text-to-image synthesis -- SAM-GAN -- Self-attention mechanism -- Machine learning
Neural computers -- Periodicals
Neural networks (Computer science) -- Periodicals
Neural networks (Neurobiology) -- Periodicals
Nervous System -- Periodicals
Ordinateurs neuronaux -- Périodiques
Réseaux neuronaux (Informatique) -- Périodiques
Réseaux neuronaux (Neurobiologie) -- Périodiques
Neural computers
Neural networks (Computer science)
Neural networks (Neurobiology)
Periodicals
006.32 - Journal URLs:
- http://www.sciencedirect.com/science/journal/08936080 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.neunet.2021.01.023 ↗
- Languages:
- English
- ISSNs:
- 0893-6080
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 6081.280800
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 16322.xml