Chemical identification and indexing in full-text articles: an overview of the NLM-Chem track at BioCreative VII. (7th March 2023)
- Record Type:
- Journal Article
- Title:
- Chemical identification and indexing in full-text articles: an overview of the NLM-Chem track at BioCreative VII. (7th March 2023)
- Main Title:
- Chemical identification and indexing in full-text articles: an overview of the NLM-Chem track at BioCreative VII
- Authors:
- Leaman, Robert
Islamaj, Rezarta
Adams, Virginia
Alliheedi, Mohammed A
Almeida, João Rafael
Antunes, Rui
Bevan, Robert
Chang, Yung-Chun
Erdengasileng, Arslan
Hodgskiss, Matthew
Ida, Ryuki
Kim, Hyunjae
Li, Keqiao
Mercer, Robert E
Mertová, Lukrécia
Mobasher, Ghadeer
Shin, Hoo-Chang
Sung, Mujeen
Tsujimura, Tomoki
Yeh, Wen-Chao
Lu, Zhiyong - Abstract:
- Abstract: The BioCreative National Library of Medicine (NLM)-Chem track calls for a community effort to fine-tune automated recognition of chemical names in the biomedical literature. Chemicals are one of the most searched biomedical entities in PubMed, and—as highlighted during the coronavirus disease 2019 pandemic—their identification may significantly advance research in multiple biomedical subfields. While previous community challenges focused on identifying chemical names mentioned in titles and abstracts, the full text contains valuable additional detail. We, therefore, organized the BioCreative NLM-Chem track as a community effort to address automated chemical entity recognition in full-text articles. The track consisted of two tasks: (i) chemical identification and (ii) chemical indexing. The chemical identification task required predicting all chemicals mentioned in recently published full-text articles, both span [i.e. named entity recognition (NER)] and normalization (i.e. entity linking), using Medical Subject Headings (MeSH). The chemical indexing task required identifying which chemicals reflect topics for each article and should therefore appear in the listing of MeSH terms for the document in the MEDLINE article indexing. This manuscript summarizes the BioCreative NLM-Chem track and post-challenge experiments. We received a total of 85 submissions from 17 teams worldwide. The highest performance achieved for the chemical identification task was 0.8672 FAbstract: The BioCreative National Library of Medicine (NLM)-Chem track calls for a community effort to fine-tune automated recognition of chemical names in the biomedical literature. Chemicals are one of the most searched biomedical entities in PubMed, and—as highlighted during the coronavirus disease 2019 pandemic—their identification may significantly advance research in multiple biomedical subfields. While previous community challenges focused on identifying chemical names mentioned in titles and abstracts, the full text contains valuable additional detail. We, therefore, organized the BioCreative NLM-Chem track as a community effort to address automated chemical entity recognition in full-text articles. The track consisted of two tasks: (i) chemical identification and (ii) chemical indexing. The chemical identification task required predicting all chemicals mentioned in recently published full-text articles, both span [i.e. named entity recognition (NER)] and normalization (i.e. entity linking), using Medical Subject Headings (MeSH). The chemical indexing task required identifying which chemicals reflect topics for each article and should therefore appear in the listing of MeSH terms for the document in the MEDLINE article indexing. This manuscript summarizes the BioCreative NLM-Chem track and post-challenge experiments. We received a total of 85 submissions from 17 teams worldwide. The highest performance achieved for the chemical identification task was 0.8672 F -score (0.8759 precision and 0.8587 recall) for strict NER performance and 0.8136 F -score (0.8621 precision and 0.7702 recall) for strict normalization performance. The highest performance achieved for the chemical indexing task was 0.6073 F -score (0.7417 precision and 0.5141 recall). This community challenge demonstrated that (i) the current substantial achievements in deep learning technologies can be utilized to improve automated prediction accuracy further and (ii) the chemical indexing task is substantially more challenging. We look forward to further developing biomedical text–mining methods to respond to the rapid growth of biomedical literature. The NLM-Chem track dataset and other challenge materials are publicly available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BC7-NLM-Chem-track/ . Database URL https://ftp.ncbi.nlm.nih.gov/pub/lu/BC7-NLM-Chem-track/ … (more)
- Is Part Of:
- Database. Volume 2023(2023)
- Journal:
- Database
- Issue:
- Volume 2023(2023)
- Issue Display:
- Volume 2023, Issue 2023 (2023)
- Year:
- 2023
- Volume:
- 2023
- Issue:
- 2023
- Issue Sort Value:
- 2023-2023-2023-0000
- Page Start:
- Page End:
- Publication Date:
- 2023-03-07
- Subjects:
- Biology -- Databases -- Periodicals
Bioinformatics -- Periodicals
570.285 - Journal URLs:
- http://database.oxfordjournals.org/ ↗
http://ukcatalogue.oup.com/ ↗ - DOI:
- 10.1093/database/baad005 ↗
- Languages:
- English
- ISSNs:
- 1758-0463
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 26137.xml