Hybrid bag of approaches to characterize selection criteria for cohort identification. (14th June 2019)
- Record Type:
- Journal Article
- Title:
- Hybrid bag of approaches to characterize selection criteria for cohort identification. (14th June 2019)
- Main Title:
- Hybrid bag of approaches to characterize selection criteria for cohort identification
- Authors:
- Vydiswaran, V G Vinod
Strayhorn, Asher
Zhao, Xinyan
Robinson, Phil
Agarwal, Mahesh
Bagazinski, Erin
Essiet, Madia
Iott, Bradley E
Joo, Hyeon
Ko, PingJui
Lee, Dahee
Lu, Jin Xiu
Liu, Jinghui
Murali, Adharsh
Sasagawa, Koki
Wang, Tianshi
Yuan, Nalingna - Abstract:
- Abstract: Objective: The 2018 National NLP Clinical Challenge (2018 n2c2) focused on the task of cohort selection for clinical trials, where participating systems were tasked with analyzing longitudinal patient records to determine if the patients met or did not meet any of the 13 selection criteria. This article describes our participation in this shared task. Materials and Methods: We followed a hybrid approach combining pattern-based, knowledge-intensive, and feature weighting techniques. After preprocessing the notes using publicly available natural language processing tools, we developed individual criterion-specific components that relied on collecting knowledge resources relevant for these criteria and pattern-based and weighting approaches to identify "met" and "not met" cases. Results: As part of the 2018 n2c2 challenge, 3 runs were submitted. The overall micro-averaged F1 on the training set was 0.9444. On the test set, the micro-averaged F1 for the 3 submitted runs were 0.9075, 0.9065, and 0.9056. The best run was placed second in the overall challenge and all 3 runs were statistically similar to the top-ranked system. A reimplemented system achieved the best overall F1 of 0.9111 on the test set. Discussion: We highlight the need for a focused resource-intensive effort to address the class imbalance in the cohort selection identification task. Conclusion: Our hybrid approach was able to identify all selection criteria with high F1 performance on both training andAbstract: Objective: The 2018 National NLP Clinical Challenge (2018 n2c2) focused on the task of cohort selection for clinical trials, where participating systems were tasked with analyzing longitudinal patient records to determine if the patients met or did not meet any of the 13 selection criteria. This article describes our participation in this shared task. Materials and Methods: We followed a hybrid approach combining pattern-based, knowledge-intensive, and feature weighting techniques. After preprocessing the notes using publicly available natural language processing tools, we developed individual criterion-specific components that relied on collecting knowledge resources relevant for these criteria and pattern-based and weighting approaches to identify "met" and "not met" cases. Results: As part of the 2018 n2c2 challenge, 3 runs were submitted. The overall micro-averaged F1 on the training set was 0.9444. On the test set, the micro-averaged F1 for the 3 submitted runs were 0.9075, 0.9065, and 0.9056. The best run was placed second in the overall challenge and all 3 runs were statistically similar to the top-ranked system. A reimplemented system achieved the best overall F1 of 0.9111 on the test set. Discussion: We highlight the need for a focused resource-intensive effort to address the class imbalance in the cohort selection identification task. Conclusion: Our hybrid approach was able to identify all selection criteria with high F1 performance on both training and test sets. Based on our participation in the 2018 n2c2 task, we conclude that there is merit in continuing a focused criterion-specific analysis and developing appropriate knowledge resources to build a quality cohort selection system. … (more)
- Is Part Of:
- Journal of the American Medical Informatics Association. Volume 26:Number 11(2019)
- Journal:
- Journal of the American Medical Informatics Association
- Issue:
- Volume 26:Number 11(2019)
- Issue Display:
- Volume 26, Issue 11 (2019)
- Year:
- 2019
- Volume:
- 26
- Issue:
- 11
- Issue Sort Value:
- 2019-0026-0011-0000
- Page Start:
- 1172
- Page End:
- 1180
- Publication Date:
- 2019-06-14
- Subjects:
- natural language processing (L01.224.065.580) -- information storage and retrieval [L01.313.500.750.280] -- information systems [L01.313.500.750.300] -- cohort identification -- clinical trial selection criteria
Medical informatics -- Periodicals
Information Services -- Periodicals
Medical Informatics -- Periodicals
Médecine -- Informatique -- Périodiques
Informatica
Geneeskunde
Informatique médicale
Computer network resources
Electronic journals
610.285 - Journal URLs:
- http://jamia.bmj.com/ ↗
http://www.jamia.org ↗
http://www.pubmedcentral.nih.gov/tocrender.fcgi?journal=76 ↗
http://www.sciencedirect.com/science/journal/10675027 ↗
http://jamia.oxfordjournals.org/ ↗
http://www.oxfordjournals.org/en/ ↗ - DOI:
- 10.1093/jamia/ocz079 ↗
- Languages:
- English
- ISSNs:
- 1067-5027
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4689.025000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 22004.xml