A framework for employing longitudinally collected multicenter electronic health records to stratify heterogeneous patient populations on disease history. (9th February 2022)
- Record Type:
- Journal Article
- Title:
- A framework for employing longitudinally collected multicenter electronic health records to stratify heterogeneous patient populations on disease history. (9th February 2022)
- Main Title:
- A framework for employing longitudinally collected multicenter electronic health records to stratify heterogeneous patient populations on disease history
- Authors:
- Maurits, Marc P
Korsunsky, Ilya
Raychaudhuri, Soumya
Murphy, Shawn N
Smoller, Jordan W
Weiss, Scott T
Petukhova, Lynn M
Weng, Chunhua
Wei, Wei-Qi
Huizinga, Thomas W J
Reinders, Marcel J T
Karlson, Elizabeth W
van den Akker, Erik B
Knevel, Rachel - Abstract:
- Abstract: Objective: To facilitate patient disease subset and risk factor identification by constructing a pipeline which is generalizable, provides easily interpretable results, and allows replication by overcoming electronic health records (EHRs) batch effects. Material and Methods: We used 1872 billing codes in EHRs of 102 880 patients from 12 healthcare systems. Using tools borrowed from single-cell omics, we mitigated center-specific batch effects and performed clustering to identify patients with highly similar medical history patterns across the various centers. Our visualization method (PheSpec) depicts the phenotypic profile of clusters, applies a novel filtering of noninformative codes (Ranked Scope Pervasion), and indicates the most distinguishing features. Results: We observed 114 clinically meaningful profiles, for example, linking prostate hyperplasia with cancer and diabetes with cardiovascular problems and grouping pediatric developmental disorders. Our framework identified disease subsets, exemplified by 6 "other headache" clusters, where phenotypic profiles suggested different underlying mechanisms: migraine, convulsion, injury, eye problems, joint pain, and pituitary gland disorders. Phenotypic patterns replicated well, with high correlations of ≥0.75 to an average of 6 (2–8) of the 12 different cohorts, demonstrating the consistency with which our method discovers disease history profiles. Discussion: Costly clinical research ventures should be based onAbstract: Objective: To facilitate patient disease subset and risk factor identification by constructing a pipeline which is generalizable, provides easily interpretable results, and allows replication by overcoming electronic health records (EHRs) batch effects. Material and Methods: We used 1872 billing codes in EHRs of 102 880 patients from 12 healthcare systems. Using tools borrowed from single-cell omics, we mitigated center-specific batch effects and performed clustering to identify patients with highly similar medical history patterns across the various centers. Our visualization method (PheSpec) depicts the phenotypic profile of clusters, applies a novel filtering of noninformative codes (Ranked Scope Pervasion), and indicates the most distinguishing features. Results: We observed 114 clinically meaningful profiles, for example, linking prostate hyperplasia with cancer and diabetes with cardiovascular problems and grouping pediatric developmental disorders. Our framework identified disease subsets, exemplified by 6 "other headache" clusters, where phenotypic profiles suggested different underlying mechanisms: migraine, convulsion, injury, eye problems, joint pain, and pituitary gland disorders. Phenotypic patterns replicated well, with high correlations of ≥0.75 to an average of 6 (2–8) of the 12 different cohorts, demonstrating the consistency with which our method discovers disease history profiles. Discussion: Costly clinical research ventures should be based on solid hypotheses. We repurpose methods from single-cell omics to build these hypotheses from observational EHR data, distilling useful information from complex data. Conclusion: We establish a generalizable pipeline for the identification and replication of clinically meaningful (sub)phenotypes from widely available high-dimensional billing codes. This approach overcomes datatype problems and produces comprehensive visualizations of validation-ready phenotypes. … (more)
- Is Part Of:
- Journal of the American Medical Informatics Association. Volume 29:Number 5(2022)
- Journal:
- Journal of the American Medical Informatics Association
- Issue:
- Volume 29:Number 5(2022)
- Issue Display:
- Volume 29, Issue 5 (2022)
- Year:
- 2022
- Volume:
- 29
- Issue:
- 5
- Issue Sort Value:
- 2022-0029-0005-0000
- Page Start:
- 761
- Page End:
- 769
- Publication Date:
- 2022-02-09
- Subjects:
- electronic health records -- clustering -- electronic medical records -- ICD -- PhenoGraph -- eMERGE
Medical informatics -- Periodicals
Information Services -- Periodicals
Medical Informatics -- Periodicals
Médecine -- Informatique -- Périodiques
Informatica
Geneeskunde
Informatique médicale
Computer network resources
Electronic journals
610.285 - Journal URLs:
- http://jamia.bmj.com/ ↗
http://www.jamia.org ↗
http://www.pubmedcentral.nih.gov/tocrender.fcgi?journal=76 ↗
http://www.sciencedirect.com/science/journal/10675027 ↗
http://jamia.oxfordjournals.org/ ↗
http://www.oxfordjournals.org/en/ ↗ - DOI:
- 10.1093/jamia/ocac008 ↗
- Languages:
- English
- ISSNs:
- 1067-5027
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4689.025000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 27011.xml