A feature-based approach to predict hot spots in protein–DNA binding interfaces. (8th April 2019)
- Record Type:
- Journal Article
- Title:
- A feature-based approach to predict hot spots in protein–DNA binding interfaces. (8th April 2019)
- Main Title:
- A feature-based approach to predict hot spots in protein–DNA binding interfaces
- Authors:
- Zhang, Sijia
Zhao, Le
Zheng, Chun-Hou
Xia, Junfeng - Abstract:
- Abstract: DNA-binding hot spot residues of proteins are dominant and fundamental interface residues that contribute most of the binding free energy of protein–DNA interfaces. As experimental methods for identifying hot spots are expensive and time consuming, computational approaches are urgently required in predicting hot spots on a large scale. In this work, we systematically assessed a wide variety of 114 features from a combination of the protein sequence, structure, network and solvent accessible information and their combinations along with various feature selection strategies for hot spot prediction. We then trained and compared four commonly used machine learning models, namely, support vector machine (SVM), random forest, Naïve Bayes and k-nearest neighbor, for the identification of hot spots using 10-fold cross-validation and the independent test set. Our results show that (1) features based on the solvent accessible surface area have significant effect on hot spot prediction; (2) different but complementary features generally enhance the prediction performance; and (3) SVM outperforms other machine learning methods on both training and independent test sets. In an effort to improve predictive performance, we developed a feature-based method, namely, PrPDH (Prediction of Protein–DNA binding Hot spots), for the prediction of hot spots in protein–DNA binding interfaces using SVM based on the selected 10 optimal features. Comparative results on benchmark data setsAbstract: DNA-binding hot spot residues of proteins are dominant and fundamental interface residues that contribute most of the binding free energy of protein–DNA interfaces. As experimental methods for identifying hot spots are expensive and time consuming, computational approaches are urgently required in predicting hot spots on a large scale. In this work, we systematically assessed a wide variety of 114 features from a combination of the protein sequence, structure, network and solvent accessible information and their combinations along with various feature selection strategies for hot spot prediction. We then trained and compared four commonly used machine learning models, namely, support vector machine (SVM), random forest, Naïve Bayes and k-nearest neighbor, for the identification of hot spots using 10-fold cross-validation and the independent test set. Our results show that (1) features based on the solvent accessible surface area have significant effect on hot spot prediction; (2) different but complementary features generally enhance the prediction performance; and (3) SVM outperforms other machine learning methods on both training and independent test sets. In an effort to improve predictive performance, we developed a feature-based method, namely, PrPDH (Prediction of Protein–DNA binding Hot spots), for the prediction of hot spots in protein–DNA binding interfaces using SVM based on the selected 10 optimal features. Comparative results on benchmark data sets indicate that our predictor is able to achieve generally better performance in predicting hot spots compared to the state-of-the-art predictors. A user-friendly web server for PrPDH is well established and is freely available at http://bioinfo.ahu.edu.cn:8080/PrPDH . … (more)
- Is Part Of:
- Briefings in bioinformatics. Volume 21:Number 3(2020)
- Journal:
- Briefings in bioinformatics
- Issue:
- Volume 21:Number 3(2020)
- Issue Display:
- Volume 21, Issue 3 (2020)
- Year:
- 2020
- Volume:
- 21
- Issue:
- 3
- Issue Sort Value:
- 2020-0021-0003-0000
- Page Start:
- 1038
- Page End:
- 1046
- Publication Date:
- 2019-04-08
- Subjects:
- machine learning -- feature selection -- protein–DNA interaction -- hot spot -- support vector machine
Genetics -- Data processing -- Periodicals
Molecular biology -- Data processing -- Periodicals
Genomes -- Data processing -- Periodicals
572.80285 - Journal URLs:
- http://bib.oxfordjournals.org ↗
http://www.oxfordjournals.org/content?genre=journal&issn=1477-4054 ↗
http://ukcatalogue.oup.com/ ↗
http://firstsearch.oclc.org ↗ - DOI:
- 10.1093/bib/bbz037 ↗
- Languages:
- English
- ISSNs:
- 1467-5463
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 2283.958363
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 15135.xml