A CNN-transformer hybrid approach for decoding visual neural activity into text. (February 2022)
- Record Type:
- Journal Article
- Title:
- A CNN-transformer hybrid approach for decoding visual neural activity into text. (February 2022)
- Main Title:
- A CNN-transformer hybrid approach for decoding visual neural activity into text
- Authors:
- Zhang, Jiang
Li, Chen
Liu, Ganwanming
Min, Min
Wang, Chong
Li, Jiyi
Wang, Yuting
Yan, Hongmei
Zuo, Zhentao
Huang, Wei
Chen, Huafu - Abstract:
- Highlights: A CNN-Transformer hybrid decoding model is proposed to decode visual neural activities evoked by natural images into texts about the visual stimuli. A specific architecture of the transformer is investigated to improve the decoding performance. The function of visual durations, attention mapping, and visual regions are explored to understand the neural mechanism in the human brain. Abstract: Background and Objective: Most studies used neural activities evoked by linguistic stimuli such as phrases or sentences to decode the language structure. However, compared to linguistic stimuli, it is more common for the human brain to perceive the outside world through non-linguistic stimuli such as natural images, so only relying on linguistic stimuli cannot fully understand the information perceived by the human brain. To address this, an end-to-end mapping model between visual neural activities evoked by non-linguistic stimuli and visual contents is demanded. Methods: Inspired by the success of the Transformer network in neural machine translation and the convolutional neural network (CNN) in computer vision, here a CNN-Transformer hybrid language decoding model is constructed in an end-to-end fashion to decode functional magnetic resonance imaging (fMRI) signals evoked by natural images into descriptive texts about the visual stimuli. Specifically, this model first encodes a semantic sequence extracted by a two-layer 1D CNN from the multi-time visual neural activity intoHighlights: A CNN-Transformer hybrid decoding model is proposed to decode visual neural activities evoked by natural images into texts about the visual stimuli. A specific architecture of the transformer is investigated to improve the decoding performance. The function of visual durations, attention mapping, and visual regions are explored to understand the neural mechanism in the human brain. Abstract: Background and Objective: Most studies used neural activities evoked by linguistic stimuli such as phrases or sentences to decode the language structure. However, compared to linguistic stimuli, it is more common for the human brain to perceive the outside world through non-linguistic stimuli such as natural images, so only relying on linguistic stimuli cannot fully understand the information perceived by the human brain. To address this, an end-to-end mapping model between visual neural activities evoked by non-linguistic stimuli and visual contents is demanded. Methods: Inspired by the success of the Transformer network in neural machine translation and the convolutional neural network (CNN) in computer vision, here a CNN-Transformer hybrid language decoding model is constructed in an end-to-end fashion to decode functional magnetic resonance imaging (fMRI) signals evoked by natural images into descriptive texts about the visual stimuli. Specifically, this model first encodes a semantic sequence extracted by a two-layer 1D CNN from the multi-time visual neural activity into a multi-level abstract representation, then decodes this representation, step by step, into an English sentence. Results: Experimental results show that the decoded texts are semantically consistent with the corresponding ground truth annotations. Additionally, by varying the encoding and decoding layers and modifying the original positional encoding of the Transformer, we found that a specific architecture of the Transformer is required in this work. Conclusions: The study results indicate that the proposed model can decode the visual neural activities evoked by natural images into descriptive text about the visual stimuli in the form of sentences. Hence, it may be considered as a potential computer-aided tool for neuroscientists to understand the neural mechanism of visual information processing in the human brain in the future. … (more)
- Is Part Of:
- Computer methods and programs in biomedicine. Volume 214(2022)
- Journal:
- Computer methods and programs in biomedicine
- Issue:
- Volume 214(2022)
- Issue Display:
- Volume 214, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 214
- Issue:
- 2022
- Issue Sort Value:
- 2022-0214-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-02
- Subjects:
- Deep learning -- Brain decoding -- Transformer -- CNN -- fMRI
Medicine -- Computer programs -- Periodicals
Biology -- Computer programs -- Periodicals
Computers -- Periodicals
Medicine -- Periodicals
Médecine -- Logiciels -- Périodiques
Biologie -- Logiciels -- Périodiques
Biology -- Computer programs
Medicine -- Computer programs
Periodicals
Electronic journals
610.28 - Journal URLs:
- http://www.sciencedirect.com/science/journal/01692607 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.cmpb.2021.106586 ↗
- Languages:
- English
- ISSNs:
- 0169-2607
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.095000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 20621.xml