Knowledge extraction for assisted curation of summaries of bacterial transcription factor properties. (11th December 2020)
- Record Type:
- Journal Article
- Title:
- Knowledge extraction for assisted curation of summaries of bacterial transcription factor properties. (11th December 2020)
- Main Title:
- Knowledge extraction for assisted curation of summaries of bacterial transcription factor properties
- Authors:
- Méndez-Cruz, Carlos-Francisco
Blanchet, Antonio
Godínez, Alan
Arroyo-Fernández, Ignacio
Gama-Castro, Socorro
Martínez-Luna, Sara Berenice
González-Colín, Cristian
Collado-Vides, Julio - Abstract:
- Abstract: Transcription factors (TFs) play a main role in transcriptional regulation of bacteria, as they regulate transcription of the genetic information encoded in DNA. Thus, the curation of the properties of these regulatory proteins is essential for a better understanding of transcriptional regulation. However, traditional manual curation of article collections to compile descriptions of TF properties takes significant time and effort due to the overwhelming amount of biomedical literature, which increases every day. The development of automatic approaches for knowledge extraction to assist curation is therefore critical. Here, we show an effective approach for knowledge extraction to assist curation of summaries describing bacterial TF properties based on an automatic text summarization strategy. We were able to recover automatically a median 77% of the knowledge contained in manual summaries describing properties of 177 TFs of Escherichia coli K-12 by processing 5961 scientific articles. For 71% of the TFs, our approach extracted new knowledge that can be used to expand manual descriptions. Furthermore, as we trained our predictive model with manual summaries of E. coli, we also generated summaries for 185 TFs of Salmonella enterica serovar Typhimurium from 3498 articles. According to the manual curation of 10 of these Salmonella typhimurium summaries, 96% of their sentences contained relevant knowledge. Our results demonstrate the feasibility to assist manualAbstract: Transcription factors (TFs) play a main role in transcriptional regulation of bacteria, as they regulate transcription of the genetic information encoded in DNA. Thus, the curation of the properties of these regulatory proteins is essential for a better understanding of transcriptional regulation. However, traditional manual curation of article collections to compile descriptions of TF properties takes significant time and effort due to the overwhelming amount of biomedical literature, which increases every day. The development of automatic approaches for knowledge extraction to assist curation is therefore critical. Here, we show an effective approach for knowledge extraction to assist curation of summaries describing bacterial TF properties based on an automatic text summarization strategy. We were able to recover automatically a median 77% of the knowledge contained in manual summaries describing properties of 177 TFs of Escherichia coli K-12 by processing 5961 scientific articles. For 71% of the TFs, our approach extracted new knowledge that can be used to expand manual descriptions. Furthermore, as we trained our predictive model with manual summaries of E. coli, we also generated summaries for 185 TFs of Salmonella enterica serovar Typhimurium from 3498 articles. According to the manual curation of 10 of these Salmonella typhimurium summaries, 96% of their sentences contained relevant knowledge. Our results demonstrate the feasibility to assist manual curation to expand manual summaries with new knowledge automatically extracted and to create new summaries of bacteria for which these curation efforts do not exist. Database URL: The automatic summaries of the TFs of E. coli and Salmonella and the automatic summarizer are available in GitHub (https://github.com/laigen-unam/tf-properties-summarizer.git ). … (more)
- Is Part Of:
- Database. Volume 2020(2020)
- Journal:
- Database
- Issue:
- Volume 2020(2020)
- Issue Display:
- Volume 2020, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 2020
- Issue:
- 2020
- Issue Sort Value:
- 2020-2020-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-12-11
- Subjects:
- Biology -- Databases -- Periodicals
Bioinformatics -- Periodicals
570.285 - Journal URLs:
- http://database.oxfordjournals.org/ ↗
http://ukcatalogue.oup.com/ ↗ - DOI:
- 10.1093/database/baaa109 ↗
- Languages:
- English
- ISSNs:
- 1758-0463
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 26038.xml