Unsupervised machine learning and prognostic factors of survival in chronic lymphocytic leukemia. (1st June 2020)
- Record Type:
- Journal Article
- Title:
- Unsupervised machine learning and prognostic factors of survival in chronic lymphocytic leukemia. (1st June 2020)
- Main Title:
- Unsupervised machine learning and prognostic factors of survival in chronic lymphocytic leukemia
- Authors:
- Coombes, Caitlin E
Abrams, Zachary B
Li, Suli
Abruzzo, Lynne V
Coombes, Kevin R - Abstract:
- Abstract: Objective: Unsupervised machine learning approaches hold promise for large-scale clinical data. However, the heterogeneity of clinical data raises new methodological challenges in feature selection, choosing a distance metric that captures biological meaning, and visualization. We hypothesized that clustering could discover prognostic groups from patients with chronic lymphocytic leukemia, a disease that provides biological validation through well-understood outcomes. Methods: To address this challenge, we applied k-medoids clustering with 10 distance metrics to 2 experiments ("A" and "B") with mixed clinical features collapsed to binary vectors and visualized with both multidimensional scaling and t-stochastic neighbor embedding. To assess prognostic utility, we performed survival analysis using a Cox proportional hazard model, log-rank test, and Kaplan-Meier curves. Results: In both experiments, survival analysis revealed a statistically significant association between clusters and survival outcomes (A: overall survival, P = .0164; B: time from diagnosis to treatment, P = .0039). Multidimensional scaling separated clusters along a gradient mirroring the order of overall survival. Longer survival was associated with mutated immunoglobulin heavy-chain variable region gene ( IGHV ) status, absent Zap 70 expression, female sex, and younger age. Conclusions: This approach to mixed-type data handling and selection of distance metric captured well-understood, binary,Abstract: Objective: Unsupervised machine learning approaches hold promise for large-scale clinical data. However, the heterogeneity of clinical data raises new methodological challenges in feature selection, choosing a distance metric that captures biological meaning, and visualization. We hypothesized that clustering could discover prognostic groups from patients with chronic lymphocytic leukemia, a disease that provides biological validation through well-understood outcomes. Methods: To address this challenge, we applied k-medoids clustering with 10 distance metrics to 2 experiments ("A" and "B") with mixed clinical features collapsed to binary vectors and visualized with both multidimensional scaling and t-stochastic neighbor embedding. To assess prognostic utility, we performed survival analysis using a Cox proportional hazard model, log-rank test, and Kaplan-Meier curves. Results: In both experiments, survival analysis revealed a statistically significant association between clusters and survival outcomes (A: overall survival, P = .0164; B: time from diagnosis to treatment, P = .0039). Multidimensional scaling separated clusters along a gradient mirroring the order of overall survival. Longer survival was associated with mutated immunoglobulin heavy-chain variable region gene ( IGHV ) status, absent Zap 70 expression, female sex, and younger age. Conclusions: This approach to mixed-type data handling and selection of distance metric captured well-understood, binary, prognostic markers in chronic lymphocytic leukemia (sex, IGHV mutation status, ZAP70 expression status) with high fidelity. … (more)
- Is Part Of:
- Journal of the American Medical Informatics Association. Volume 27:Number 7(2020)
- Journal:
- Journal of the American Medical Informatics Association
- Issue:
- Volume 27:Number 7(2020)
- Issue Display:
- Volume 27, Issue 7 (2020)
- Year:
- 2020
- Volume:
- 27
- Issue:
- 7
- Issue Sort Value:
- 2020-0027-0007-0000
- Page Start:
- 1019
- Page End:
- 1027
- Publication Date:
- 2020-06-01
- Subjects:
- unsupervised machine learning -- clustering -- chronic lymphocytic leukemia -- clinical informatics, mixed-type data
Medical informatics -- Periodicals
Information Services -- Periodicals
Medical Informatics -- Periodicals
Médecine -- Informatique -- Périodiques
Informatica
Geneeskunde
Informatique médicale
Computer network resources
Electronic journals
610.285 - Journal URLs:
- http://jamia.bmj.com/ ↗
http://www.jamia.org ↗
http://www.pubmedcentral.nih.gov/tocrender.fcgi?journal=76 ↗
http://www.sciencedirect.com/science/journal/10675027 ↗
http://jamia.oxfordjournals.org/ ↗
http://www.oxfordjournals.org/en/ ↗ - DOI:
- 10.1093/jamia/ocaa060 ↗
- Languages:
- English
- ISSNs:
- 1067-5027
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4689.025000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 15146.xml