Binary acronym disambiguation in clinical notes from electronic health records with an application in computational phenotyping. (June 2022)
- Record Type:
- Journal Article
- Title:
- Binary acronym disambiguation in clinical notes from electronic health records with an application in computational phenotyping. (June 2022)
- Main Title:
- Binary acronym disambiguation in clinical notes from electronic health records with an application in computational phenotyping
- Authors:
- Link, Nicholas B.
Huang, Sicong
Cai, Tianrun
Sun, Jiehuan
Dahal, Kumar
Costa, Lauren
Cho, Kelly
Liao, Katherine
Cai, Tianxi
Hong, Chuan - Abstract:
- Highlights: Acronym disambiguation – identifying the meaning of an acronym – is important for information retrieval in clinical EHR systems. Most acronym disambiguation methods rely on manual annotation. We propose a novel unsupervised method, CASEml, that uses the surrounding words as well as visit information to disambiguate acronyms. CASEml performs as good or better than a state-of-the-art knowledge-based methods. We demonstrate the utility of CASEml for downstream NLP tasks using clinical EHR text. Abstract: Objective: The use of electronic health records (EHR) systems has grown over the past decade, and with it, the need to extract information from unstructured clinical narratives. Clinical notes, however, frequently contain acronyms with several potential senses (meanings) and traditional natural language processing (NLP) techniques cannot differentiate between these senses. In this study we introduce a semi-supervised method for binary acronym disambiguation, the task of classifying a target sense for acronyms in the clinical EHR notes. Methods: We developed a semi-supervised ensemble machine learning (CASEml) algorithm to automatically identify when an acronym means a target sense by leveraging semantic embeddings, visit-level text and billing information. The algorithm was validated using note data from the Veterans Affairs hospital system to classify the meaning of three acronyms: RA, MS, and MI. We compared the performance of CASEml against another standardHighlights: Acronym disambiguation – identifying the meaning of an acronym – is important for information retrieval in clinical EHR systems. Most acronym disambiguation methods rely on manual annotation. We propose a novel unsupervised method, CASEml, that uses the surrounding words as well as visit information to disambiguate acronyms. CASEml performs as good or better than a state-of-the-art knowledge-based methods. We demonstrate the utility of CASEml for downstream NLP tasks using clinical EHR text. Abstract: Objective: The use of electronic health records (EHR) systems has grown over the past decade, and with it, the need to extract information from unstructured clinical narratives. Clinical notes, however, frequently contain acronyms with several potential senses (meanings) and traditional natural language processing (NLP) techniques cannot differentiate between these senses. In this study we introduce a semi-supervised method for binary acronym disambiguation, the task of classifying a target sense for acronyms in the clinical EHR notes. Methods: We developed a semi-supervised ensemble machine learning (CASEml) algorithm to automatically identify when an acronym means a target sense by leveraging semantic embeddings, visit-level text and billing information. The algorithm was validated using note data from the Veterans Affairs hospital system to classify the meaning of three acronyms: RA, MS, and MI. We compared the performance of CASEml against another standard semi-supervised method and a baseline metric selecting the most frequent acronym sense. Along with evaluating the performance of these methods for specific instances of acronyms, we evaluated the impact of acronym disambiguation on NLP-driven phenotyping of rheumatoid arthritis. Results: CASEml achieved accuracies of 0.947, 0.911, and 0.706 for RA, MS, and MI, respectively, higher than a standard baseline metric and (on average) higher than a state-of-the-art semi-supervised method. As well, we demonstrated that applying CASEml to medical notes improves the AUC of a phenotype algorithm for rheumatoid arthritis. Conclusion: CASEml is a novel method that accurately disambiguates acronyms in clinical notes and has advantages over commonly used supervised and semi-supervised machine learning approaches. In addition, CASEml improves the performance of NLP tasks that rely on ambiguous acronyms, such as phenotyping. … (more)
- Is Part Of:
- International journal of medical informatics. Volume 162(2022)
- Journal:
- International journal of medical informatics
- Issue:
- Volume 162(2022)
- Issue Display:
- Volume 162, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 162
- Issue:
- 2022
- Issue Sort Value:
- 2022-0162-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-06
- Subjects:
- Acronym disambiguation -- Electronic health records -- Natural language processing -- Predictive modeling -- Semantic embedding -- Unsupervised learning
EHR Electronic Health Records -- NLP Natural Language Processing
Medical informatics -- Periodicals
Information science -- Periodicals
Computers -- Periodicals
Medical technology -- Periodicals
Medical Informatics -- Periodicals
Technology, Medical -- Periodicals
Computers
Information science
Medical informatics
Medical technology
Electronic journals
Periodicals
Electronic journals
610.285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/13865056 ↗
http://www.clinicalkey.com/dura/browse/journalIssue/13865056 ↗
http://www.clinicalkey.com.au/dura/browse/journalIssue/13865056 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.ijmedinf.2022.104753 ↗
- Languages:
- English
- ISSNs:
- 1386-5056
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4542.345250
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 21643.xml