Underserved populations with missing race ethnicity data differ significantly from those with structured race/ethnicity documentation. (26th April 2019)
- Record Type:
- Journal Article
- Title:
- Underserved populations with missing race ethnicity data differ significantly from those with structured race/ethnicity documentation. (26th April 2019)
- Main Title:
- Underserved populations with missing race ethnicity data differ significantly from those with structured race/ethnicity documentation
- Authors:
- Sholle, Evan T
Pinheiro, Laura C
Adekkanattu, Prakash
Davila, Marcos A
Johnson, Stephen B
Pathak, Jyotishman
Sinha, Sanjai
Li, Cassidie
Lubansky, Stasi A
Safford, Monika M
Campion, Thomas R - Abstract:
- Abstract: Objective: We aimed to address deficiencies in structured electronic health record (EHR) data for race and ethnicity by identifying black and Hispanic patients from unstructured clinical notes and assessing differences between patients with or without structured race/ethnicity data. Materials and Methods: Using EHR notes for 16 665 patients with encounters at a primary care practice, we developed rule-based natural language processing (NLP) algorithms to classify patients as black/Hispanic. We evaluated performance of the method against an annotated gold standard, compared race and ethnicity between NLP-derived and structured EHR data, and compared characteristics of patients identified as black or Hispanic using only NLP vs patients identified as such only in structured EHR data. Results: For the sample of 16 665 patients, NLP identified 948 additional patients as black, a 26%increase, and 665 additional patients as Hispanic, a 20% increase. Compared with the patients identified as black or Hispanic in structured EHR data, patients identified as black or Hispanic via NLP only were older, more likely to be male, less likely to have commercial insurance, and more likely to have higher comorbidity. Discussion: Structured EHR data for race and ethnicity are subject to data quality issues. Supplementing structured EHR race data with NLP-derived race and ethnicity may allow researchers to better assess the demographic makeup of populations and draw more accurateAbstract: Objective: We aimed to address deficiencies in structured electronic health record (EHR) data for race and ethnicity by identifying black and Hispanic patients from unstructured clinical notes and assessing differences between patients with or without structured race/ethnicity data. Materials and Methods: Using EHR notes for 16 665 patients with encounters at a primary care practice, we developed rule-based natural language processing (NLP) algorithms to classify patients as black/Hispanic. We evaluated performance of the method against an annotated gold standard, compared race and ethnicity between NLP-derived and structured EHR data, and compared characteristics of patients identified as black or Hispanic using only NLP vs patients identified as such only in structured EHR data. Results: For the sample of 16 665 patients, NLP identified 948 additional patients as black, a 26%increase, and 665 additional patients as Hispanic, a 20% increase. Compared with the patients identified as black or Hispanic in structured EHR data, patients identified as black or Hispanic via NLP only were older, more likely to be male, less likely to have commercial insurance, and more likely to have higher comorbidity. Discussion: Structured EHR data for race and ethnicity are subject to data quality issues. Supplementing structured EHR race data with NLP-derived race and ethnicity may allow researchers to better assess the demographic makeup of populations and draw more accurate conclusions about intergroup differences in health outcomes. Conclusions: Black or Hispanic patients who are not documented as such in structured EHR race/ethnicity fields differ significantly from those who are. Relatively simple NLP can help address this limitation. … (more)
- Is Part Of:
- Journal of the American Medical Informatics Association. Volume 26:Number 8/9(2019)
- Journal:
- Journal of the American Medical Informatics Association
- Issue:
- Volume 26:Number 8/9(2019)
- Issue Display:
- Volume 26, Issue 8/9 (2019)
- Year:
- 2019
- Volume:
- 26
- Issue:
- 8/9
- Issue Sort Value:
- 2019-0026-NaN-0000
- Page Start:
- 722
- Page End:
- 729
- Publication Date:
- 2019-04-26
- Subjects:
- race -- ethnicity -- natural language processing -- electronic health record
Medical informatics -- Periodicals
Information Services -- Periodicals
Medical Informatics -- Periodicals
Médecine -- Informatique -- Périodiques
Informatica
Geneeskunde
Informatique médicale
Computer network resources
Electronic journals
610.285 - Journal URLs:
- http://jamia.bmj.com/ ↗
http://www.jamia.org ↗
http://www.pubmedcentral.nih.gov/tocrender.fcgi?journal=76 ↗
http://www.sciencedirect.com/science/journal/10675027 ↗
http://jamia.oxfordjournals.org/ ↗
http://www.oxfordjournals.org/en/ ↗ - DOI:
- 10.1093/jamia/ocz040 ↗
- Languages:
- English
- ISSNs:
- 1067-5027
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4689.025000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 15260.xml