Addressing data imbalance problems in ligand-binding site prediction using a variational autoencoder and a convolutional neural network. Issue 6 (28th July 2021)
- Record Type:
- Journal Article
- Title:
- Addressing data imbalance problems in ligand-binding site prediction using a variational autoencoder and a convolutional neural network. Issue 6 (28th July 2021)
- Main Title:
- Addressing data imbalance problems in ligand-binding site prediction using a variational autoencoder and a convolutional neural network
- Authors:
- Nguyen, Trinh-Trung-Duong
Nguyen, Duc-Khanh
Ou, Yu-Yen - Abstract:
- Abstract: Since 2015, a fast growing number of deep learning–based methods have been proposed for protein–ligand binding site prediction and many have achieved promising performance. These methods, however, neglect the imbalanced nature of binding site prediction problems. Traditional data-based approaches for handling data imbalance employ linear interpolation of minority class samples. Such approaches may not be fully exploited by deep neural networks on downstream tasks. We present a novel technique for balancing input classes by developing a deep neural network–based variational autoencoder (VAE) that aims to learn important attributes of the minority classes concerning nonlinear combinations. After learning, the trained VAE was used to generate new minority class samples that were later added to the original data to create a balanced dataset. Finally, a convolutional neural network was used for classification, for which we assumed that the nonlinearity could be fully integrated. As a case study, we applied our method to the identification of FAD- and FMN-binding sites of electron transport proteins. Compared with the best classifiers that use traditional machine learning algorithms, our models obtained a great improvement on sensitivity while maintaining similar or higher levels of accuracy and specificity. We also demonstrate that our method is better than other data imbalance handling techniques, such as SMOTE, ADASYN, and class weight adjustment. Additionally, ourAbstract: Since 2015, a fast growing number of deep learning–based methods have been proposed for protein–ligand binding site prediction and many have achieved promising performance. These methods, however, neglect the imbalanced nature of binding site prediction problems. Traditional data-based approaches for handling data imbalance employ linear interpolation of minority class samples. Such approaches may not be fully exploited by deep neural networks on downstream tasks. We present a novel technique for balancing input classes by developing a deep neural network–based variational autoencoder (VAE) that aims to learn important attributes of the minority classes concerning nonlinear combinations. After learning, the trained VAE was used to generate new minority class samples that were later added to the original data to create a balanced dataset. Finally, a convolutional neural network was used for classification, for which we assumed that the nonlinearity could be fully integrated. As a case study, we applied our method to the identification of FAD- and FMN-binding sites of electron transport proteins. Compared with the best classifiers that use traditional machine learning algorithms, our models obtained a great improvement on sensitivity while maintaining similar or higher levels of accuracy and specificity. We also demonstrate that our method is better than other data imbalance handling techniques, such as SMOTE, ADASYN, and class weight adjustment. Additionally, our models also outperform existing predictors in predicting the same binding types. Our method is general and can be applied to other data types for prediction problems with moderate-to-heavy data imbalances. … (more)
- Is Part Of:
- Briefings in bioinformatics. Volume 22:Issue 6(2021)
- Journal:
- Briefings in bioinformatics
- Issue:
- Volume 22:Issue 6(2021)
- Issue Display:
- Volume 22, Issue 6 (2021)
- Year:
- 2021
- Volume:
- 22
- Issue:
- 6
- Issue Sort Value:
- 2021-0022-0006-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-07-28
- Subjects:
- variational autoencoder -- convolutional neural network -- protein–ligand binding site prediction -- data imbalance handlin -- gelectron transport proteins
Genetics -- Data processing -- Periodicals
Molecular biology -- Data processing -- Periodicals
Genomes -- Data processing -- Periodicals
572.80285 - Journal URLs:
- http://bib.oxfordjournals.org ↗
http://www.oxfordjournals.org/content?genre=journal&issn=1477-4054 ↗
http://ukcatalogue.oup.com/ ↗
http://firstsearch.oclc.org ↗ - DOI:
- 10.1093/bib/bbab277 ↗
- Languages:
- English
- ISSNs:
- 1467-5463
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 2283.958363
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 19692.xml