Can machines learn the mutation signatures of SARS-CoV-2 and enable viral-genotype guided predictive prognosis?. Issue 15 (15th August 2022)
- Record Type:
- Journal Article
- Title:
- Can machines learn the mutation signatures of SARS-CoV-2 and enable viral-genotype guided predictive prognosis?. Issue 15 (15th August 2022)
- Main Title:
- Can machines learn the mutation signatures of SARS-CoV-2 and enable viral-genotype guided predictive prognosis?
- Authors:
- Nagpal, Sunil
Pinna, Nishal Kumar
Pant, Namrata
Singh, Rohan
Srivastava, Divyanshu
Mande, Sharmila S. - Abstract:
- Graphical abstract: Highlights: Patient severity labelled genomes can aid genotype guided predictive modeling. Machine learnt models may capture signs of severity discriminants in virus-genotype. Each geography has peculiar determinants. Severity after all is multimodal. Explainable AI can aid insights into the model features: age, genotype, epitopic load. Unless temporally benchmarked, ML models must be cautiously employed for predictive prognosis. Abstract: Motivation: Continuous emergence of new variants through appearance/accumulation/disappearance of mutations is a hallmark of many viral diseases. SARS-CoV-2 variants have particularly exerted tremendous pressure on global healthcare system owing to their life threatening and debilitating implications. The sheer plurality of variants and huge scale of genomic data have added to the challenges of tracing the mutations/variants and their relationship to infection severity (if any). Results: We explored the suitability of virus-genotype guided machine-learning in infection prognosis and identification of features/mutations-of-interest. Total 199, 519 outcome-traced genomes, representing 45, 625 nucleotide-mutations, were employed. Among these, post data-cleaning, Low and High severity genomes were classified using an integrated model (employing virus genotype, epitopic-influence and patient-age) with consistently high ROC-AUC (Asia:0.97 ± 0.01, Europe:0.94 ± 0.01, N.America:0.92 ± 0.02, Africa:0.94 ± 0.07,Graphical abstract: Highlights: Patient severity labelled genomes can aid genotype guided predictive modeling. Machine learnt models may capture signs of severity discriminants in virus-genotype. Each geography has peculiar determinants. Severity after all is multimodal. Explainable AI can aid insights into the model features: age, genotype, epitopic load. Unless temporally benchmarked, ML models must be cautiously employed for predictive prognosis. Abstract: Motivation: Continuous emergence of new variants through appearance/accumulation/disappearance of mutations is a hallmark of many viral diseases. SARS-CoV-2 variants have particularly exerted tremendous pressure on global healthcare system owing to their life threatening and debilitating implications. The sheer plurality of variants and huge scale of genomic data have added to the challenges of tracing the mutations/variants and their relationship to infection severity (if any). Results: We explored the suitability of virus-genotype guided machine-learning in infection prognosis and identification of features/mutations-of-interest. Total 199, 519 outcome-traced genomes, representing 45, 625 nucleotide-mutations, were employed. Among these, post data-cleaning, Low and High severity genomes were classified using an integrated model (employing virus genotype, epitopic-influence and patient-age) with consistently high ROC-AUC (Asia:0.97 ± 0.01, Europe:0.94 ± 0.01, N.America:0.92 ± 0.02, Africa:0.94 ± 0.07, S.America:0.93 ± 03). Although virus-genotype alone could enable high predictivity (0.97 ± 0.01, 0.89 ± 0.02, 0.86 ± 0.04, 0.95 ± 0.06, 0.9 ± 0.04), the performance was not found to be consistent and the models for a few geographies displayed significant improvement in predictivity when the influence of age and/or epitope was incorporated with virus-genotype (Wilcoxon p_BH < 0.05). Neither age or epitopic-influence or clade information could out-perform the integrated features. A sparse model (6 features), developed using patient-age and epitopic-influence of the mutations, performed reasonably well (>0.87 ± 0.03, 0.91 ± 0.01, 0.87 ± 0.03, 0.84 ± 0.08, 0.89 ± 0.05). High-performance models were employed for inferring the important mutations-of-interest using Shapley Additive exPlanations (SHAP). The changes in HLA interactions of the mutated epitopes of reference SARS-CoV-2 were then subsequently probed. Notably, we also describe the significance of a 'temporal-modeling approach' to benchmark the models linked with continuously evolving pathogens. We conclude that while machine learning can play a vital role in identifying relevant mutations and factors driving the severity, caution should be exercised in using the genotypic signatures for predictive prognosis. … (more)
- Is Part Of:
- Journal of molecular biology. Volume 434:Issue 15(2022)
- Journal:
- Journal of molecular biology
- Issue:
- Volume 434:Issue 15(2022)
- Issue Display:
- Volume 434, Issue 15 (2022)
- Year:
- 2022
- Volume:
- 434
- Issue:
- 15
- Issue Sort Value:
- 2022-0434-0015-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-08-15
- Subjects:
- SARS-CoV-2 -- Predictive prognosis -- Machine learning -- Mutation identification -- Temporal benchmarking
SARS-CoV-2 Severe Acute Respiratory Syndrome Coronavirus 2 -- VoC Variant of Concern -- ML Machine Learning -- HLA Human Leukocyte Antigens -- MHC Major Histocompatibility Complex -- SHAP SHapley Additive exPlanations -- GISAID Global Initiative on Sharing All Influenza Data -- ROC Receiver Operator Characteristic -- ROC AUC Area Under the ROC Curve -- t-SNE t-distributed Stochastic Neighbor Embedding -- PCA Principal Component Analysis -- UMAP Uniform Manifold Approximation and Projection -- XGBoost eXtreme Gradient Boosting -- VREs Variants of Reference Epitopes
Molecular biology -- Periodicals
Biology -- Periodicals
Biochemistry -- Periodicals
Bacteriology -- Periodicals
Molecular Biology -- Periodicals
Biochemistry -- Periodicals
Biologie moléculaire -- Périodiques
Biologie -- Périodiques
Biochimie -- Périodiques
Moleculaire biologie
Biochemistry
Biology
Molecular biology
Periodicals
572.805 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00222836 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.jmb.2022.167684 ↗
- Languages:
- English
- ISSNs:
- 0022-2836
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 5020.700000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 22587.xml