Structural protein fold recognition based on secondary structure and evolutionary information using machine learning algorithms. (April 2021)
- Record Type:
- Journal Article
- Title:
- Structural protein fold recognition based on secondary structure and evolutionary information using machine learning algorithms. (April 2021)
- Main Title:
- Structural protein fold recognition based on secondary structure and evolutionary information using machine learning algorithms
- Authors:
- Qin, Xinyi
Liu, Min
Zhang, Lu
Liu, Guangzhong - Abstract:
- Graphical abstract: Highlights: Fold recognition is a commonly used method of protein structural classification which is used to determine the tertiary structure of protein and many researchers have been studying and analyzing this problem in recent years. The effect of using feature extraction methods based on secondary structure and evolutionary information of protein is better. The effect of using machine learning algorithms based on tree structure including Random Forest, XGBoost and LightGBM is better than using KNN, SVM, SimpleRNN and LSTM algorithms. The combination of LightGBM feature selection algorithm and IFS method make the accuracy of fold recognition higher. Abstract: Understanding the function of protein is conducive to research in advanced fields such as gene therapy of diseases, the development and design of new drugs, etc. The prerequisite for understanding the function of a protein is to determine its tertiary structure. The realization of protein structure classification is indispensable for this problem and fold recognition is a commonly used method of protein structure classification. Protein sequences of 40% identity in the ASTRAL protein classification database are used for fold recognition research in current work to predict 27 folding types which mostly belong to four protein structural classes: α, β, α / β and α + β . We extract features from primary structure of protein using methods covering DSSP, PSSM and HMM which are based on secondaryGraphical abstract: Highlights: Fold recognition is a commonly used method of protein structural classification which is used to determine the tertiary structure of protein and many researchers have been studying and analyzing this problem in recent years. The effect of using feature extraction methods based on secondary structure and evolutionary information of protein is better. The effect of using machine learning algorithms based on tree structure including Random Forest, XGBoost and LightGBM is better than using KNN, SVM, SimpleRNN and LSTM algorithms. The combination of LightGBM feature selection algorithm and IFS method make the accuracy of fold recognition higher. Abstract: Understanding the function of protein is conducive to research in advanced fields such as gene therapy of diseases, the development and design of new drugs, etc. The prerequisite for understanding the function of a protein is to determine its tertiary structure. The realization of protein structure classification is indispensable for this problem and fold recognition is a commonly used method of protein structure classification. Protein sequences of 40% identity in the ASTRAL protein classification database are used for fold recognition research in current work to predict 27 folding types which mostly belong to four protein structural classes: α, β, α / β and α + β . We extract features from primary structure of protein using methods covering DSSP, PSSM and HMM which are based on secondary structure and evolutionary information to convert protein sequences into feature vectors that can be recognized by machine learning algorithm and utilize the combination of LightGBM feature selection algorithm and incremental feature selection method (IFS) to find the optimal classifiers respectively constructed by machine learning algorithms on the basis of tree structure including Random Forest, XGBoost and LightGBM. Bayesian optimization method is used for hyper-parameter adjustment of machine learning algorithms to make the accuracy of fold recognition reach as high as 93.45% at last. The result obtained by the model we propose is outstanding in the study of protein fold recognition. … (more)
- Is Part Of:
- Computational biology and chemistry. Volume 91(2021)
- Journal:
- Computational biology and chemistry
- Issue:
- Volume 91(2021)
- Issue Display:
- Volume 91, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 91
- Issue:
- 2021
- Issue Sort Value:
- 2021-0091-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-04
- Subjects:
- Protein fold recognition -- ASTRAL -- Secondary structure -- Evolutionary information -- Feature selection algorithm -- IFS
Chemistry -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
Biochemistry -- Data processing
Biology -- Data processing
Molecular biology -- Data processing
Periodicals
Electronic journals
542.85 - Journal URLs:
- http://www.sciencedirect.com/science/journal/14769271 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiolchem.2021.107456 ↗
- Languages:
- English
- ISSNs:
- 1476-9271
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3390.576700
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 16176.xml