Accelerated curation of checkpoint inhibitor-induced colitis cases from electronic health records. Issue 1 (1st April 2023)
- Record Type:
- Journal Article
- Title:
- Accelerated curation of checkpoint inhibitor-induced colitis cases from electronic health records. Issue 1 (1st April 2023)
- Main Title:
- Accelerated curation of checkpoint inhibitor-induced colitis cases from electronic health records
- Authors:
- Rahman, Protiva
Ye, Cheng
Mittendorf, Kathleen F
Lenoue-Newton, Michele
Micheel, Christine
Wolber, Jan
Osterman, Travis
Fabbri, Daniel - Abstract:
- Abstract: Objective: Automatically identifying patients at risk of immune checkpoint inhibitor (ICI)-induced colitis allows physicians to improve patientcare. However, predictive models require training data curated from electronic health records (EHR). Our objective is to automatically identify notes documenting ICI-colitis cases to accelerate data curation. Materials and Methods: We present a data pipeline to automatically identify ICI-colitis from EHR notes, accelerating chart review. The pipeline relies on BERT, a state-of-the-art natural language processing (NLP) model. The first stage of the pipeline segments long notes using keywords identified through a logistic classifier and applies BERT to identify ICI-colitis notes. The next stage uses a second BERT model tuned to identify false positive notes and remove notes that were likely positive for mentioning colitis as a side-effect. The final stage further accelerates curation by highlighting the colitis-relevant portions of notes. Specifically, we use BERT's attention scores to find high-density regions describing colitis. Results: The overall pipeline identified colitis notes with 84% precision and reduced the curator note review load by 75%. The segment BERT classifier had a high recall of 0.98, which is crucial to identify the low incidence (<10%) of colitis. Discussion: Curation from EHR notes is a burdensome task, especially when the curation topic is complicated. Methods described in this work are not only usefulAbstract: Objective: Automatically identifying patients at risk of immune checkpoint inhibitor (ICI)-induced colitis allows physicians to improve patientcare. However, predictive models require training data curated from electronic health records (EHR). Our objective is to automatically identify notes documenting ICI-colitis cases to accelerate data curation. Materials and Methods: We present a data pipeline to automatically identify ICI-colitis from EHR notes, accelerating chart review. The pipeline relies on BERT, a state-of-the-art natural language processing (NLP) model. The first stage of the pipeline segments long notes using keywords identified through a logistic classifier and applies BERT to identify ICI-colitis notes. The next stage uses a second BERT model tuned to identify false positive notes and remove notes that were likely positive for mentioning colitis as a side-effect. The final stage further accelerates curation by highlighting the colitis-relevant portions of notes. Specifically, we use BERT's attention scores to find high-density regions describing colitis. Results: The overall pipeline identified colitis notes with 84% precision and reduced the curator note review load by 75%. The segment BERT classifier had a high recall of 0.98, which is crucial to identify the low incidence (<10%) of colitis. Discussion: Curation from EHR notes is a burdensome task, especially when the curation topic is complicated. Methods described in this work are not only useful for ICI colitis but can also be adapted for other domains. Conclusion: Our extraction pipeline reduces manual note review load and makes EHR data more accessible for research. Lay Summary: Patients treated with immune checkpoint inhibitors (ICI) often experience colitis as a side-effect. Building predictive models for ICI-induced colitis can help healthcare providers improve patient care. However, developing predictive models requires training data from electronic health record notes since ICI colitis does not have clear diagnosis codes and can be described in varied language. Using keyword search to identify relevant notes returns over 200 000 notes, only 10% of which are true positives based on manual review. To address this problem, we developed a data pipeline to automatically identify ICI-induced colitis notes. This pipeline consists of 3 stages. The first stage identifies potentially positive ICI colitis notes. The second stage filters the output from the first stage to remove false positives. The final stage highlights sections of the notes relevant for ICI colitis determination to aid manual reviewers. Using our pipeline, the manual review burden was reduced by 75% (from 128K to 30K notes). … (more)
- Is Part Of:
- JAMIA open. Volume 6:Issue 1(2023)
- Journal:
- JAMIA open
- Issue:
- Volume 6:Issue 1(2023)
- Issue Display:
- Volume 6, Issue 1 (2023)
- Year:
- 2023
- Volume:
- 6
- Issue:
- 1
- Issue Sort Value:
- 2023-0006-0001-0000
- Page Start:
- Page End:
- Publication Date:
- 2023-04-01
- Subjects:
- deep learning -- curation -- information extraction -- EHR
Medical informatics -- Periodicals
610.285 - Journal URLs:
- http://www.oxfordjournals.org/ ↗
https://academic.oup.com/jamiaopen ↗ - DOI:
- 10.1093/jamiaopen/ooad017 ↗
- Languages:
- English
- ISSNs:
- 2574-2531
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 26778.xml