Understanding the mutational frequency in SARS-CoV-2 proteome using structural features. (August 2022)
- Record Type:
- Journal Article
- Title:
- Understanding the mutational frequency in SARS-CoV-2 proteome using structural features. (August 2022)
- Main Title:
- Understanding the mutational frequency in SARS-CoV-2 proteome using structural features
- Authors:
- Rawat, Puneet
Sharma, Divya
Pandey, Medha
Prabakaran, R.
Gromiha, M. Michael - Abstract:
- Abstract: The prolonged transmission of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus in the human population has led to demographic divergence and the emergence of several location-specific clusters of viral strains. Although the effect of mutation(s) on severity and survival of the virus is still unclear, it is evident that certain sites in the viral proteome are more/less prone to mutations. In fact, millions of SARS-CoV-2 sequences collected all over the world have provided us a unique opportunity to understand viral protein mutations and develop novel computational approaches to predict mutational patterns. In this study, we have classified the mutation sites into low and high mutability classes based on viral isolates count containing mutations. The physicochemical features and structural analysis of the SARS-CoV-2 proteins showed that features including residue type, surface accessibility, residue bulkiness, stability and sequence conservation at the mutation site were able to classify the low and high mutability sites. We further developed machine learning models using above-mentioned features, to predict low and high mutability sites at different selection thresholds (ranging 5–30% of topmost and bottommost mutated sites) and observed the improvement in performance as the selection threshold is reduced (prediction accuracy ranging from 65 to 77%). The analysis will be useful for early detection of variants of concern for the SARS-CoV-2,Abstract: The prolonged transmission of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus in the human population has led to demographic divergence and the emergence of several location-specific clusters of viral strains. Although the effect of mutation(s) on severity and survival of the virus is still unclear, it is evident that certain sites in the viral proteome are more/less prone to mutations. In fact, millions of SARS-CoV-2 sequences collected all over the world have provided us a unique opportunity to understand viral protein mutations and develop novel computational approaches to predict mutational patterns. In this study, we have classified the mutation sites into low and high mutability classes based on viral isolates count containing mutations. The physicochemical features and structural analysis of the SARS-CoV-2 proteins showed that features including residue type, surface accessibility, residue bulkiness, stability and sequence conservation at the mutation site were able to classify the low and high mutability sites. We further developed machine learning models using above-mentioned features, to predict low and high mutability sites at different selection thresholds (ranging 5–30% of topmost and bottommost mutated sites) and observed the improvement in performance as the selection threshold is reduced (prediction accuracy ranging from 65 to 77%). The analysis will be useful for early detection of variants of concern for the SARS-CoV-2, which can also be applied to other existing and emerging viruses for another pandemic prevention. Graphical abstract: Image 1 Highlights: Analyzed the sequence and structural features of SARS-CoV-2 proteome to understand the mutability of protein sites. Residue properties, surface accessibility and sequence conservation are able to classify the low and high mutability sites. Developed machine-learning models for predicting low/high mutability sites in SARS-CoV-2 proteome. Revealed the characteristic features of variants of interest/concern based on physiochemical properties. … (more)
- Is Part Of:
- Computers in biology and medicine. Volume 147(2022)
- Journal:
- Computers in biology and medicine
- Issue:
- Volume 147(2022)
- Issue Display:
- Volume 147, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 147
- Issue:
- 2022
- Issue Sort Value:
- 2022-0147-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-08
- Subjects:
- COVID-19 -- SARS-CoV-2 -- Mutation -- Protein mutability -- Machine learning
SARS-CoV-2 Severe acute respiratory syndrome coronavirus 2 -- ACE2 Angiotensin-converting enzyme 2 -- HIV Human immunodeficiency virus -- ML Machine learning -- VOI Variants of interest -- VOC Variants of concern -- SNP Single nucleotide polymorphism -- PSSM Position specific scoring matrix -- ROC Receiver operating characteristic -- AUC Area under the curve -- SVM Support vector machine -- LOOCV Leave-one-out cross-validation -- WHO World Health Organization -- PWM Position weight matrix -- IC Information content -- SMO Sequential minimal optimization -- MOI Mutations of interest -- MOC Mutations of concern
Medicine -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
610.285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00104825/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiomed.2022.105708 ↗
- Languages:
- English
- ISSNs:
- 0010-4825
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.880000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 22279.xml