LEAP: Using machine learning to support variant classification in a clinical setting. Issue 6 (1st April 2020)
- Record Type:
- Journal Article
- Title:
- LEAP: Using machine learning to support variant classification in a clinical setting. Issue 6 (1st April 2020)
- Main Title:
- LEAP: Using machine learning to support variant classification in a clinical setting
- Authors:
- Lai, Carmen
Zimmer, Anjali D.
O'Connor, Robert
Kim, Serra
Chan, Ray
van den Akker, Jeroen
Zhou, Alicia Y.
Topper, Scott
Mishne, Gilad - Abstract:
- Abstract: Advances in genome sequencing have led to a tremendous increase in the discovery of novel missense variants, but evidence for determining clinical significance can be limited or conflicting. Here, we present Learning from Evidence to Assess Pathogenicity (LEAP), a machine learning model that utilizes a variety of feature categories to classify variants, and achieves high performance in multiple genes and different health conditions. Feature categories include functional predictions, splice predictions, population frequencies, conservation scores, protein domain data, and clinical observation data such as personal and family history and covariant information. L2‐regularized logistic regression and random forest classification models were trained on missense variants detected and classified during the course of routine clinical testing at Color Genomics (14, 226 variants from 24 cancer‐related genes and 5, 398 variants from 30 cardiovascular‐related genes). Using 10‐fold cross‐validated predictions, the logistic regression model achieved an area under the receiver operating characteristic curve (AUROC) of 97.8% (cancer) and 98.8% (cardiovascular), while the random forest model achieved 98.3% (cancer) and 98.6% (cardiovascular). We demonstrate generalizability to different genes by validating predictions on genes withheld from training (96.8% AUROC). High accuracy and broad applicability make LEAP effective in the clinical setting as a high‐throughput quality controlAbstract: Advances in genome sequencing have led to a tremendous increase in the discovery of novel missense variants, but evidence for determining clinical significance can be limited or conflicting. Here, we present Learning from Evidence to Assess Pathogenicity (LEAP), a machine learning model that utilizes a variety of feature categories to classify variants, and achieves high performance in multiple genes and different health conditions. Feature categories include functional predictions, splice predictions, population frequencies, conservation scores, protein domain data, and clinical observation data such as personal and family history and covariant information. L2‐regularized logistic regression and random forest classification models were trained on missense variants detected and classified during the course of routine clinical testing at Color Genomics (14, 226 variants from 24 cancer‐related genes and 5, 398 variants from 30 cardiovascular‐related genes). Using 10‐fold cross‐validated predictions, the logistic regression model achieved an area under the receiver operating characteristic curve (AUROC) of 97.8% (cancer) and 98.8% (cardiovascular), while the random forest model achieved 98.3% (cancer) and 98.6% (cardiovascular). We demonstrate generalizability to different genes by validating predictions on genes withheld from training (96.8% AUROC). High accuracy and broad applicability make LEAP effective in the clinical setting as a high‐throughput quality control layer. Abstract : LEAP, a machine learning model for variant classification, was developed with explainability in mind. The expected variant classification and supporting evidence are data‐driven, and are displayed in a web application to aid variant scientists in a clinical reporting workflow. Contributing evidence features are ranked based on overall significance, and contribution magnitude and direction (pathogenic or benign driver) are displayed and color coded. As additional context, percentiles are shown for numeric features with respect to the distribution observed in the training data, and past classifications for similar variants (based on gene, chromosome, and exon) are listed. … (more)
- Is Part Of:
- Human mutation. Volume 41:Issue 6(2020)
- Journal:
- Human mutation
- Issue:
- Volume 41:Issue 6(2020)
- Issue Display:
- Volume 41, Issue 6 (2020)
- Year:
- 2020
- Volume:
- 41
- Issue:
- 6
- Issue Sort Value:
- 2020-0041-0006-0000
- Page Start:
- 1079
- Page End:
- 1090
- Publication Date:
- 2020-04-01
- Subjects:
- clinical genetics -- genetic testing -- machine learning -- variant classification
Human chromosome abnormalities -- Periodicals
Mutation (Biology) -- Periodicals
616.04205 - Journal URLs:
- http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1098-1004 ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1002/humu.24011 ↗
- Languages:
- English
- ISSNs:
- 1059-7794
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4336.217000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 13158.xml