ABC-Net: a divide-and-conquer based deep learning architecture for SMILES recognition from molecular images. Issue 2 (25th February 2022)
- Record Type:
- Journal Article
- Title:
- ABC-Net: a divide-and-conquer based deep learning architecture for SMILES recognition from molecular images. Issue 2 (25th February 2022)
- Main Title:
- ABC-Net: a divide-and-conquer based deep learning architecture for SMILES recognition from molecular images
- Authors:
- Zhang, Xiao-Chen
Yi, Jia-Cai
Yang, Guo-Ping
Wu, Cheng-Kun
Hou, Ting-Jun
Cao, Dong-Sheng - Abstract:
- Abstract: Structural information for chemical compounds is often described by pictorial images in most scientific documents, which cannot be easily understood and manipulated by computers. This dilemma makes optical chemical structure recognition (OCSR) an essential tool for automatically mining knowledge from an enormous amount of literature. However, existing OCSR methods fall far short of our expectations for realistic requirements due to their poor recovery accuracy. In this paper, we developed a deep neural network model named ABC-Net (Atom and Bond Center Network) to predict graph structures directly. Based on the divide-and-conquer principle, we propose to model an atom or a bond as a single point in the center. In this way, we can leverage a fully convolutional neural network (CNN) to generate a series of heat-maps to identify these points and predict relevant properties, such as atom types, atom charges, bond types and other properties. Thus, the molecular structure can be recovered by assembling the detected atoms and bonds. Our approach integrates all the detection and property prediction tasks into a single fully CNN, which is scalable and capable of processing molecular images quite efficiently. Experimental results demonstrate that our method could achieve a significant improvement in recognition performance compared with publicly available tools. The proposed method could be considered as a promising solution to OCSR problems and a starting point for theAbstract: Structural information for chemical compounds is often described by pictorial images in most scientific documents, which cannot be easily understood and manipulated by computers. This dilemma makes optical chemical structure recognition (OCSR) an essential tool for automatically mining knowledge from an enormous amount of literature. However, existing OCSR methods fall far short of our expectations for realistic requirements due to their poor recovery accuracy. In this paper, we developed a deep neural network model named ABC-Net (Atom and Bond Center Network) to predict graph structures directly. Based on the divide-and-conquer principle, we propose to model an atom or a bond as a single point in the center. In this way, we can leverage a fully convolutional neural network (CNN) to generate a series of heat-maps to identify these points and predict relevant properties, such as atom types, atom charges, bond types and other properties. Thus, the molecular structure can be recovered by assembling the detected atoms and bonds. Our approach integrates all the detection and property prediction tasks into a single fully CNN, which is scalable and capable of processing molecular images quite efficiently. Experimental results demonstrate that our method could achieve a significant improvement in recognition performance compared with publicly available tools. The proposed method could be considered as a promising solution to OCSR problems and a starting point for the acquisition of molecular information in the literature. … (more)
- Is Part Of:
- Briefings in bioinformatics. Volume 23:Issue 2(2022)
- Journal:
- Briefings in bioinformatics
- Issue:
- Volume 23:Issue 2(2022)
- Issue Display:
- Volume 23, Issue 2 (2022)
- Year:
- 2022
- Volume:
- 23
- Issue:
- 2
- Issue Sort Value:
- 2022-0023-0002-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-02-25
- Subjects:
- optical chemical structure recognition -- divide and conquer -- deep learning -- fully convolutional neural network
Genetics -- Data processing -- Periodicals
Molecular biology -- Data processing -- Periodicals
Genomes -- Data processing -- Periodicals
572.80285 - Journal URLs:
- http://bib.oxfordjournals.org ↗
http://www.oxfordjournals.org/content?genre=journal&issn=1477-4054 ↗
http://ukcatalogue.oup.com/ ↗
http://firstsearch.oclc.org ↗ - DOI:
- 10.1093/bib/bbac033 ↗
- Languages:
- English
- ISSNs:
- 1467-5463
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 2283.958363
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 20750.xml