Improving virtual screening predictive accuracy of Human kallikrein 5 inhibitors using machine learning models. (August 2017)
- Record Type:
- Journal Article
- Title:
- Improving virtual screening predictive accuracy of Human kallikrein 5 inhibitors using machine learning models. (August 2017)
- Main Title:
- Improving virtual screening predictive accuracy of Human kallikrein 5 inhibitors using machine learning models
- Authors:
- Fang, Xingang
Bagui, Sikha
Bagui, Subhash - Abstract:
- Graphical abstract: Highlights: Signature fingerprints with bond order information performs better in the QSAR study. Logistic regression exhibits high predictive performance in the virtual screening test. The successful combination suggests a feasible selection strategy on similar targets. Abstract: The readily available high throughput screening (HTS) data from the PubChem database provides an opportunity for mining of small molecules in a variety of biological systems using machine learning techniques. From the thousands of available molecular descriptors developed to encode useful chemical information representing the characteristics of molecules, descriptor selection is an essential step in building an optimal quantitative structural-activity relationship (QSAR) model. For the development of a systematic descriptor selection strategy, we need the understanding of the relationship between: (i) the descriptor selection; (ii) the choice of the machine learning model; and (iii) the characteristics of the target bio-molecule. In this work, we employed the Signature descriptor to generate a dataset on the Human kallikrein 5 (hK 5) inhibition confirmatory assay data and compared multiple classification models including logistic regression, support vector machine, random forest and k-nearest neighbor. Under optimal conditions, the logistic regression model provided extremely high overall accuracy (98%) and precision (90%), with good sensitivity (65%) in the cross validationGraphical abstract: Highlights: Signature fingerprints with bond order information performs better in the QSAR study. Logistic regression exhibits high predictive performance in the virtual screening test. The successful combination suggests a feasible selection strategy on similar targets. Abstract: The readily available high throughput screening (HTS) data from the PubChem database provides an opportunity for mining of small molecules in a variety of biological systems using machine learning techniques. From the thousands of available molecular descriptors developed to encode useful chemical information representing the characteristics of molecules, descriptor selection is an essential step in building an optimal quantitative structural-activity relationship (QSAR) model. For the development of a systematic descriptor selection strategy, we need the understanding of the relationship between: (i) the descriptor selection; (ii) the choice of the machine learning model; and (iii) the characteristics of the target bio-molecule. In this work, we employed the Signature descriptor to generate a dataset on the Human kallikrein 5 (hK 5) inhibition confirmatory assay data and compared multiple classification models including logistic regression, support vector machine, random forest and k-nearest neighbor. Under optimal conditions, the logistic regression model provided extremely high overall accuracy (98%) and precision (90%), with good sensitivity (65%) in the cross validation test. In testing the primary HTS screening data with more than 200 K molecular structures, the logistic regression model exhibited the capability of eliminating more than 99.9% of the inactive structures. As part of our exploration of the descriptor-model-target relationship, the excellent predictive performance of the combination of the Signature descriptor and the logistic regression model on the assay data of the Human kallikrein 5 (hK 5) target suggested a feasible descriptor/model selection strategy on similar targets. … (more)
- Is Part Of:
- Computational biology and chemistry. Volume 69(2017)
- Journal:
- Computational biology and chemistry
- Issue:
- Volume 69(2017)
- Issue Display:
- Volume 69, Issue 2017 (2017)
- Year:
- 2017
- Volume:
- 69
- Issue:
- 2017
- Issue Sort Value:
- 2017-0069-2017-0000
- Page Start:
- 110
- Page End:
- 119
- Publication Date:
- 2017-08
- Subjects:
- Virtual screening -- Machine learning -- QSAR -- PubChem -- Logistic regression -- Signature descriptor
Chemistry -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
Biochemistry -- Data processing
Biology -- Data processing
Molecular biology -- Data processing
Periodicals
Electronic journals
542.85 - Journal URLs:
- http://www.sciencedirect.com/science/journal/14769271 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiolchem.2017.05.007 ↗
- Languages:
- English
- ISSNs:
- 1476-9271
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3390.576700
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 2907.xml