Sparse group selection and analysis of function‐related residue for protein‐state recognition. Issue 20 (3rd June 2022)
- Record Type:
- Journal Article
- Title:
- Sparse group selection and analysis of function‐related residue for protein‐state recognition. Issue 20 (3rd June 2022)
- Main Title:
- Sparse group selection and analysis of function‐related residue for protein‐state recognition
- Authors:
- Bai, Fangyun
Puk, Kin Ming
Liu, Jin
Zhou, Hongyu
Tao, Peng
Zhou, Wenyong
Wang, Shouyi - Abstract:
- Abstract: Machine learning methods have helped to advance wide range of scientific and technological field in recent years, including computational chemistry. As the chemical systems could become complex with high dimension, feature selection could be critical but challenging to develop reliable machine learning based prediction models, especially for proteins as bio‐macromolecules. In this study, we applied sparse group lasso (SGL) method as a general feature selection method to develop classification model for an allosteric protein in different functional states. This results into a much improved model with comparable accuracy (Acc) and only 28 selected features comparing to 289 selected features from a previous study. The Acc achieves 91.50% with 1936 selected feature, which is far higher than that of baseline methods. In addition, grouping protein amino acids into secondary structures provides additional interpretability of the selected features. The selected features are verified as associated with key allosteric residues through comparison with both experimental and computational works about the model protein, and demonstrate the effectiveness and necessity of applying rigorous feature selection and evaluation methods on complex chemical systems. Abstract : Applying machine learning and feature selection to identify the allostery related residues are a relatively new research direction. The amino acid residues in the protein divided into secondary structures isAbstract: Machine learning methods have helped to advance wide range of scientific and technological field in recent years, including computational chemistry. As the chemical systems could become complex with high dimension, feature selection could be critical but challenging to develop reliable machine learning based prediction models, especially for proteins as bio‐macromolecules. In this study, we applied sparse group lasso (SGL) method as a general feature selection method to develop classification model for an allosteric protein in different functional states. This results into a much improved model with comparable accuracy (Acc) and only 28 selected features comparing to 289 selected features from a previous study. The Acc achieves 91.50% with 1936 selected feature, which is far higher than that of baseline methods. In addition, grouping protein amino acids into secondary structures provides additional interpretability of the selected features. The selected features are verified as associated with key allosteric residues through comparison with both experimental and computational works about the model protein, and demonstrate the effectiveness and necessity of applying rigorous feature selection and evaluation methods on complex chemical systems. Abstract : Applying machine learning and feature selection to identify the allostery related residues are a relatively new research direction. The amino acid residues in the protein divided into secondary structures is considered as additional label for feature selection procedure. The selected features are verified as associated with key allosteric residues. This article demonstrates the effectiveness of sparse group lasso as general feature selection method for complex biomolecular systems. … (more)
- Is Part Of:
- Journal of computational chemistry. Volume 43:Issue 20(2022)
- Journal:
- Journal of computational chemistry
- Issue:
- Volume 43:Issue 20(2022)
- Issue Display:
- Volume 43, Issue 20 (2022)
- Year:
- 2022
- Volume:
- 43
- Issue:
- 20
- Issue Sort Value:
- 2022-0043-0020-0000
- Page Start:
- 1342
- Page End:
- 1354
- Publication Date:
- 2022-06-03
- Subjects:
- classification -- feature selection -- function‐related residues -- protein states -- sparse group lasso
Chemistry -- Data processing -- Periodicals
542.85 - Journal URLs:
- http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1096-987X ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1002/jcc.26937 ↗
- Languages:
- English
- ISSNs:
- 0192-8651
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4963.460000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 22087.xml