Using Machine Learning to Identify True Somatic Variants from Next-Generation Sequencing. (30th December 2019)
- Record Type:
- Journal Article
- Title:
- Using Machine Learning to Identify True Somatic Variants from Next-Generation Sequencing. (30th December 2019)
- Main Title:
- Using Machine Learning to Identify True Somatic Variants from Next-Generation Sequencing
- Authors:
- Wu, Chao
Zhao, Xiaonan
Welsh, Mark
Costello, Kellianne
Cao, Kajia
Abou Tayoun, Ahmad
Li, Marilyn
Sarmady, Mahdi - Abstract:
- Abstract: BACKGROUND: Molecular profiling has become essential for tumor risk stratification and treatment selection. However, cancer genome complexity and technical artifacts make identification of real variants a challenge. Currently, clinical laboratories rely on manual screening, which is costly, subjective, and not scalable. We present a machine learning–based method to distinguish artifacts from bona fide single-nucleotide variants (SNVs) detected by next-generation sequencing from nonformalin-fixed paraffin-embedded tumor specimens. METHODS: A cohort of 11278 SNVs identified through clinical sequencing of tumor specimens was collected and divided into training, validation, and test sets. Each SNV was manually inspected and labeled as either real or artifact as part of clinical laboratory workflow. A 3-class (real, artifact, and uncertain) model was developed on the training set, fine-tuned with the validation set, and then evaluated on the test set. Prediction intervals reflecting the certainty of the classifications were derived during the process to label "uncertain" variants. RESULTS: The optimized classifier demonstrated 100% specificity and 97% sensitivity over 5587 SNVs of the test set. Overall, 1252 of 1341 true-positive variants were identified as real, 4143 of 4246 false-positive calls were deemed artifacts, whereas only 192 (3.4%) SNVs were labeled as "uncertain, " with zero misclassification between the true positives and artifacts in the test set.Abstract: BACKGROUND: Molecular profiling has become essential for tumor risk stratification and treatment selection. However, cancer genome complexity and technical artifacts make identification of real variants a challenge. Currently, clinical laboratories rely on manual screening, which is costly, subjective, and not scalable. We present a machine learning–based method to distinguish artifacts from bona fide single-nucleotide variants (SNVs) detected by next-generation sequencing from nonformalin-fixed paraffin-embedded tumor specimens. METHODS: A cohort of 11278 SNVs identified through clinical sequencing of tumor specimens was collected and divided into training, validation, and test sets. Each SNV was manually inspected and labeled as either real or artifact as part of clinical laboratory workflow. A 3-class (real, artifact, and uncertain) model was developed on the training set, fine-tuned with the validation set, and then evaluated on the test set. Prediction intervals reflecting the certainty of the classifications were derived during the process to label "uncertain" variants. RESULTS: The optimized classifier demonstrated 100% specificity and 97% sensitivity over 5587 SNVs of the test set. Overall, 1252 of 1341 true-positive variants were identified as real, 4143 of 4246 false-positive calls were deemed artifacts, whereas only 192 (3.4%) SNVs were labeled as "uncertain, " with zero misclassification between the true positives and artifacts in the test set. CONCLUSIONS: We presented a computational classifier to identify variant artifacts detected from tumor sequencing. Overall, 96.6% of the SNVs received definitive labels and thus were exempt from manual review. This framework could improve quality and efficiency of the variant review process in clinical laboratories. … (more)
- Is Part Of:
- Clinical chemistry. Volume 66:Number 1(2020)
- Journal:
- Clinical chemistry
- Issue:
- Volume 66:Number 1(2020)
- Issue Display:
- Volume 66, Issue 1 (2020)
- Year:
- 2020
- Volume:
- 66
- Issue:
- 1
- Issue Sort Value:
- 2020-0066-0001-0000
- Page Start:
- 239
- Page End:
- 246
- Publication Date:
- 2019-12-30
- Subjects:
- Clinical chemistry -- Periodicals
Pharmaceutical chemistry -- Periodicals
Biochemistry -- Periodicals
Biochimie -- Périodiques
Diagnostics biologiques -- Périodiques
Biochemistry
Clinical chemistry
Pharmaceutical chemistry
Biochemistry
Laboratory Techniques and Procedures
Klinische chemie
Periodicals
616.075605 - Journal URLs:
- http://www.oxfordjournals.org/ ↗
https://academic.oup.com/clinchem ↗
http://catalog.hathitrust.org/api/volumes/oclc/1554929.html ↗
http://www.clinchem.org/ ↗ - DOI:
- 10.1373/clinchem.2019.308213 ↗
- Languages:
- English
- ISSNs:
- 0009-9147
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 15141.xml