Prediction of protein mononucleotide binding sites using AlphaFold2 and machine learning. (October 2022)
- Record Type:
- Journal Article
- Title:
- Prediction of protein mononucleotide binding sites using AlphaFold2 and machine learning. (October 2022)
- Main Title:
- Prediction of protein mononucleotide binding sites using AlphaFold2 and machine learning
- Authors:
- Yamaguchi, Shohei
Nakashima, Haruka
Moriwaki, Yoshitaka
Terada, Tohru
Shimizu, Kentaro - Abstract:
- Abstract: In this study, we developed a system that predicts the binding sites of proteins for five mononucleotides (AMP, ADP, ATP, GDP, and GTP). The system comprises two machine learning (ML)-based predictors using a convolutional neural network and a gradient boosting machine, two template-based predictors based on sequence and structure alignment, and a predictor that performs ensemble learning of these four predictors. In this study, data augmentation of ligand binding sites with similar ligand structures was performed. For example, in the prediction of ADP-binding sites using ML methods, the binding sites of AMP and ATP, which have similar structures, are considered. In addition, we constructed the structure models using AlphaFold2, a highly accurate protein prediction method. The secondary structure and dihedral angle information obtained using the model structures were used as ML predictor features. Additionally, in the template-based predictor, the structures of the binding sites were used as templates to be explored based on structure alignment to identify the binding site of the target. Consequently, the template-based predictor based on structure alignment showed the best performance among the four individual predictors, and the ensemble predictor achieved the best performance, with an area under the curve of 0.958 for all mononucleotides. Graphical Abstract: ga1 Highlights: We developed a system that predicts the binding sites of proteins for fiveAbstract: In this study, we developed a system that predicts the binding sites of proteins for five mononucleotides (AMP, ADP, ATP, GDP, and GTP). The system comprises two machine learning (ML)-based predictors using a convolutional neural network and a gradient boosting machine, two template-based predictors based on sequence and structure alignment, and a predictor that performs ensemble learning of these four predictors. In this study, data augmentation of ligand binding sites with similar ligand structures was performed. For example, in the prediction of ADP-binding sites using ML methods, the binding sites of AMP and ATP, which have similar structures, are considered. In addition, we constructed the structure models using AlphaFold2, a highly accurate protein prediction method. The secondary structure and dihedral angle information obtained using the model structures were used as ML predictor features. Additionally, in the template-based predictor, the structures of the binding sites were used as templates to be explored based on structure alignment to identify the binding site of the target. Consequently, the template-based predictor based on structure alignment showed the best performance among the four individual predictors, and the ensemble predictor achieved the best performance, with an area under the curve of 0.958 for all mononucleotides. Graphical Abstract: ga1 Highlights: We developed a system that predicts the binding sites of proteins for five mononucleotides (AMP, ADP, ATP, GDP, and GTP). The system uses ensemble learning of two machine learning-based predictors and sequence and structure alignment-based predictors. The system achieved performance with an area under the curve of 0.958 for all mononucleotides. … (more)
- Is Part Of:
- Computational biology and chemistry. Volume 100(2022)
- Journal:
- Computational biology and chemistry
- Issue:
- Volume 100(2022)
- Issue Display:
- Volume 100, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 100
- Issue:
- 2022
- Issue Sort Value:
- 2022-0100-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-10
- Subjects:
- Proteins -- Mononucleotide -- Binding site -- AlphaFold2 -- Structure alignment -- Machine learning
Chemistry -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
Biochemistry -- Data processing
Biology -- Data processing
Molecular biology -- Data processing
Periodicals
Electronic journals
542.85 - Journal URLs:
- http://www.sciencedirect.com/science/journal/14769271 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiolchem.2022.107744 ↗
- Languages:
- English
- ISSNs:
- 1476-9271
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3390.576700
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 23288.xml