EliIE: An open-source information extraction system for clinical trial eligibility criteria. (1st April 2017)
- Record Type:
- Journal Article
- Title:
- EliIE: An open-source information extraction system for clinical trial eligibility criteria. (1st April 2017)
- Main Title:
- EliIE: An open-source information extraction system for clinical trial eligibility criteria
- Authors:
- Kang, Tian
Zhang, Shaodian
Tang, Youlan
Hruby, Gregory W
Rusanov, Alexander
Elhadad, Noémie
Weng, Chunhua - Abstract:
- Abstract: Objective: To develop an open-source information extraction system called Eli gibility Criteria I nformation E xtraction (EliIE) for parsing and formalizing free-text clinical research eligibility criteria (EC) following Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) version 5.0. Materials and Methods: EliIE parses EC in 4 steps: (1) clinical entity and attribute recognition, (2) negation detection, (3) relation extraction, and (4) concept normalization and output structuring. Informaticians and domain experts were recruited to design an annotation guideline and generate a training corpus of annotated EC for 230 Alzheimer's clinical trials, which were represented as queries against the OMOP CDM and included 8008 entities, 3550 attributes, and 3529 relations. A sequence labeling–based method was developed for automatic entity and attribute recognition. Negation detection was supported by NegEx and a set of predefined rules. Relation extraction was achieved by a support vector machine classifier. We further performed terminology-based concept normalization and output structuring. Results: In task-specific evaluations, the best F1 score for entity recognition was 0.79, and for relation extraction was 0.89. The accuracy of negation detection was 0.94. The overall accuracy for query formalization was 0.71 in an end-to-end evaluation. Conclusions: This study presents EliIE, an OMOP CDM–based information extraction system for automatic structuringAbstract: Objective: To develop an open-source information extraction system called Eli gibility Criteria I nformation E xtraction (EliIE) for parsing and formalizing free-text clinical research eligibility criteria (EC) following Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) version 5.0. Materials and Methods: EliIE parses EC in 4 steps: (1) clinical entity and attribute recognition, (2) negation detection, (3) relation extraction, and (4) concept normalization and output structuring. Informaticians and domain experts were recruited to design an annotation guideline and generate a training corpus of annotated EC for 230 Alzheimer's clinical trials, which were represented as queries against the OMOP CDM and included 8008 entities, 3550 attributes, and 3529 relations. A sequence labeling–based method was developed for automatic entity and attribute recognition. Negation detection was supported by NegEx and a set of predefined rules. Relation extraction was achieved by a support vector machine classifier. We further performed terminology-based concept normalization and output structuring. Results: In task-specific evaluations, the best F1 score for entity recognition was 0.79, and for relation extraction was 0.89. The accuracy of negation detection was 0.94. The overall accuracy for query formalization was 0.71 in an end-to-end evaluation. Conclusions: This study presents EliIE, an OMOP CDM–based information extraction system for automatic structuring and formalization of free-text EC. According to our evaluation, machine learning-based EliIE outperforms existing systems and shows promise to improve. … (more)
- Is Part Of:
- Journal of the American Medical Informatics Association. Volume 24:Number 6(2017:Nov.)
- Journal:
- Journal of the American Medical Informatics Association
- Issue:
- Volume 24:Number 6(2017:Nov.)
- Issue Display:
- Volume 24, Issue 6 (2017)
- Year:
- 2017
- Volume:
- 24
- Issue:
- 6
- Issue Sort Value:
- 2017-0024-0006-0000
- Page Start:
- 1062
- Page End:
- 1071
- Publication Date:
- 2017-04-01
- Subjects:
- natural language processing -- machine learning -- clinical trials -- patient selection -- common data model -- named entity recognition
Medical informatics -- Periodicals
Information Services -- Periodicals
Medical Informatics -- Periodicals
Médecine -- Informatique -- Périodiques
Informatica
Geneeskunde
Informatique médicale
Computer network resources
Electronic journals
610.285 - Journal URLs:
- http://jamia.bmj.com/ ↗
http://www.jamia.org ↗
http://www.pubmedcentral.nih.gov/tocrender.fcgi?journal=76 ↗
http://www.sciencedirect.com/science/journal/10675027 ↗
http://jamia.oxfordjournals.org/ ↗
http://www.oxfordjournals.org/en/ ↗ - DOI:
- 10.1093/jamia/ocx019 ↗
- Languages:
- English
- ISSNs:
- 1067-5027
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4689.025000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 15171.xml