Ensembling machine learning models to boost molecular affinity prediction. (August 2021)
- Record Type:
- Journal Article
- Title:
- Ensembling machine learning models to boost molecular affinity prediction. (August 2021)
- Main Title:
- Ensembling machine learning models to boost molecular affinity prediction
- Authors:
- Druchok, Maksym
Yarish, Dzvenymyra
Garkot, Sofiya
Nikolaienko, Tymofii
Gurbych, Oleksandr - Abstract:
- Graphical abstract: Highlights: We propose a machine learning-based predictor for protein-ligand binding affinities. The pipeline unites two subsequent ensembles – classification and regression. Within the approach the binding class and the binding strength can be assessed. We show that the use of diverse methods improves the prediction metrics. Abstract: This study unites six popular machine learning approaches to enhance the prediction of a molecular binding affinity between receptors (large protein molecules) and ligands (small organic molecules). Here we examine a scheme where affinity of ligands is predicted against a single receptor – human thrombin, thus, the models consider ligand features only. However, the suggested approach can be repurposed for other receptors. The methods include Support Vector Machine, Random Forest, CatBoost, feed-forward neural network, graph neural network, and Bidirectional Encoder Representations from Transformers. The first five methods use input features based on physico-chemical properties of molecules, while the last one is based on textual molecular representations. All approaches do not rely on atomic spatial coordinates, avoiding a potential bias from known structures, and are capable of generalizing for compounds with unknown conformations. Within each of the methods, we have trained two models that solve classification and regression tasks. Then, all models are grouped into a pipeline of two subsequent ensembles. The firstGraphical abstract: Highlights: We propose a machine learning-based predictor for protein-ligand binding affinities. The pipeline unites two subsequent ensembles – classification and regression. Within the approach the binding class and the binding strength can be assessed. We show that the use of diverse methods improves the prediction metrics. Abstract: This study unites six popular machine learning approaches to enhance the prediction of a molecular binding affinity between receptors (large protein molecules) and ligands (small organic molecules). Here we examine a scheme where affinity of ligands is predicted against a single receptor – human thrombin, thus, the models consider ligand features only. However, the suggested approach can be repurposed for other receptors. The methods include Support Vector Machine, Random Forest, CatBoost, feed-forward neural network, graph neural network, and Bidirectional Encoder Representations from Transformers. The first five methods use input features based on physico-chemical properties of molecules, while the last one is based on textual molecular representations. All approaches do not rely on atomic spatial coordinates, avoiding a potential bias from known structures, and are capable of generalizing for compounds with unknown conformations. Within each of the methods, we have trained two models that solve classification and regression tasks. Then, all models are grouped into a pipeline of two subsequent ensembles. The first ensemble aggregates six classification models which vote whether a ligand binds to a receptor or not. If a ligand is classified as active (i.e., binds), the second ensemble predicts its binding affinity in terms of the inhibition constant K i . … (more)
- Is Part Of:
- Computational biology and chemistry. Volume 93(2021)
- Journal:
- Computational biology and chemistry
- Issue:
- Volume 93(2021)
- Issue Display:
- Volume 93, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 93
- Issue:
- 2021
- Issue Sort Value:
- 2021-0093-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-08
- Subjects:
- Binding affinity -- Human thrombin -- Ensembled prediction -- Machine learning -- Deep neural networks
Chemistry -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
Biochemistry -- Data processing
Biology -- Data processing
Molecular biology -- Data processing
Periodicals
Electronic journals
542.85 - Journal URLs:
- http://www.sciencedirect.com/science/journal/14769271 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiolchem.2021.107529 ↗
- Languages:
- English
- ISSNs:
- 1476-9271
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3390.576700
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 17800.xml