Toward creation of a cancer drug toxicity knowledge base: automatically extracting cancer drug—side effect relationships from the literature. (18th May 2013)
- Record Type:
- Journal Article
- Title:
- Toward creation of a cancer drug toxicity knowledge base: automatically extracting cancer drug—side effect relationships from the literature. (18th May 2013)
- Main Title:
- Toward creation of a cancer drug toxicity knowledge base: automatically extracting cancer drug—side effect relationships from the literature
- Authors:
- Xu, Rong
Wang, QuanQiu - Abstract:
- Abstract: Objective A comprehensive and machine-understandable cancer drug–side effect (drug–SE) relationship knowledge base is important for in silico cancer drug target discovery, drug repurposing, and toxicity predication, and for personalized risk–benefit decisions by cancer patients. While US Food and Drug Administration (FDA) drug labels capture well-known cancer drug SE information, much cancer drug SE knowledge remains buried the published biomedical literature. We present a relationship extraction approach to extract cancer drug–SE pairs from the literature. Data and methods We used 21 354 075 MEDLINE records as the text corpus. We extracted drug–SE co-occurrence pairs using a cancer drug lexicon and a clean SE lexicon that we created. We then developed two filtering approaches to remove drug–disease treatment pairs and subsequently a ranking scheme to further prioritize filtered pairs. Finally, we analyzed relationships among SEs, gene targets, and indications. Results We extracted 56 602 cancer drug–SE pairs. The filtering algorithms improved the precision of extracted pairs from 0.252 at baseline to 0.426, representing a 69% improvement in precision with no decrease in recall. The ranking algorithm further prioritized filtered pairs and achieved a precision of 0.778 for top-ranked pairs. We showed that cancer drugs that share SEs tend to have overlapping gene targets and overlapping indications. Conclusions The relationship extraction approach is effective inAbstract: Objective A comprehensive and machine-understandable cancer drug–side effect (drug–SE) relationship knowledge base is important for in silico cancer drug target discovery, drug repurposing, and toxicity predication, and for personalized risk–benefit decisions by cancer patients. While US Food and Drug Administration (FDA) drug labels capture well-known cancer drug SE information, much cancer drug SE knowledge remains buried the published biomedical literature. We present a relationship extraction approach to extract cancer drug–SE pairs from the literature. Data and methods We used 21 354 075 MEDLINE records as the text corpus. We extracted drug–SE co-occurrence pairs using a cancer drug lexicon and a clean SE lexicon that we created. We then developed two filtering approaches to remove drug–disease treatment pairs and subsequently a ranking scheme to further prioritize filtered pairs. Finally, we analyzed relationships among SEs, gene targets, and indications. Results We extracted 56 602 cancer drug–SE pairs. The filtering algorithms improved the precision of extracted pairs from 0.252 at baseline to 0.426, representing a 69% improvement in precision with no decrease in recall. The ranking algorithm further prioritized filtered pairs and achieved a precision of 0.778 for top-ranked pairs. We showed that cancer drugs that share SEs tend to have overlapping gene targets and overlapping indications. Conclusions The relationship extraction approach is effective in extracting many cancer drug–SE pairs from the literature. This unique knowledge base, when combined with existing cancer drug SE knowledge, can facilitate drug target discovery, drug repurposing, and toxicity prediction. … (more)
- Is Part Of:
- Journal of the American Medical Informatics Association. Volume 21:Number 1(2014:Jan.)
- Journal:
- Journal of the American Medical Informatics Association
- Issue:
- Volume 21:Number 1(2014:Jan.)
- Issue Display:
- Volume 21, Issue 1 (2014)
- Year:
- 2014
- Volume:
- 21
- Issue:
- 1
- Issue Sort Value:
- 2014-0021-0001-0000
- Page Start:
- 90
- Page End:
- 96
- Publication Date:
- 2013-05-18
- Subjects:
- Information Extraction -- Text Mining -- Cancer Drug Toxicity -- Natural Language Processing -- Drug Target Discovery -- Drug Repurposing
Medical informatics -- Periodicals
Information Services -- Periodicals
Medical Informatics -- Periodicals
Médecine -- Informatique -- Périodiques
Informatica
Geneeskunde
Informatique médicale
Computer network resources
Electronic journals
610.285 - Journal URLs:
- http://jamia.bmj.com/ ↗
http://www.jamia.org ↗
http://www.pubmedcentral.nih.gov/tocrender.fcgi?journal=76 ↗
http://www.sciencedirect.com/science/journal/10675027 ↗
http://jamia.oxfordjournals.org/ ↗
http://www.oxfordjournals.org/en/ ↗ - DOI:
- 10.1136/amiajnl-2012-001584 ↗
- Languages:
- English
- ISSNs:
- 1067-5027
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4689.025000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 26899.xml