Target-DBPPred: An intelligent model for prediction of DNA-binding proteins using discrete wavelet transform based compression and light eXtreme gradient boosting. (June 2022)
- Record Type:
- Journal Article
- Title:
- Target-DBPPred: An intelligent model for prediction of DNA-binding proteins using discrete wavelet transform based compression and light eXtreme gradient boosting. (June 2022)
- Main Title:
- Target-DBPPred: An intelligent model for prediction of DNA-binding proteins using discrete wavelet transform based compression and light eXtreme gradient boosting
- Authors:
- Ali, Farman
Kumar, Harish
Patil, Shruti
Kotecha, Ketan
Banjar, Ameen
Daud, Ali - Abstract:
- Abstract: DNA-protein interaction is a critical biological process that performs influential activities, including DNA transcription and recombination. DBPs (DNA-binding proteins) are closely associated with different kinds of human diseases (asthma, cancer, and AIDS), while some of the DBPs are used in the production of antibiotics, steroids, and anti-inflammatories. Several methods have been reported for the prediction of DBPs. However, a more intelligent method is still highly desirable for the accurate prediction of DBPs. This study presents an intelligent computational method, Target-DBPPred, to improve DBPs prediction. Important features from primary protein sequences are investigated via a novel feature descriptor, called EDF-PSSM-DWT (Evolutionary difference formula position-specific scoring matrix-discrete wavelet transform) and several other multi-evolutionary methods, including F-PSSM (Filtered position-specific scoring matrix), EDF-PSSM (Evolutionary difference formula position-specific scoring matrix), PSSM-DPC (Position-specific scoring matrix-dipeptide composition), and Lead-BiPSSM (Lead-bigram-position specific scoring matrix) to encapsulate diverse multivariate features. The best feature set from the features of each descriptor is selected using sequential forward selection (SFS). Further, four models are trained using Adaboost, XGB (eXtreme gradient boosting), ERT (extremely randomized trees), and LiXGB (Light eXtreme gradient boosting) classifiers. LiXGB,Abstract: DNA-protein interaction is a critical biological process that performs influential activities, including DNA transcription and recombination. DBPs (DNA-binding proteins) are closely associated with different kinds of human diseases (asthma, cancer, and AIDS), while some of the DBPs are used in the production of antibiotics, steroids, and anti-inflammatories. Several methods have been reported for the prediction of DBPs. However, a more intelligent method is still highly desirable for the accurate prediction of DBPs. This study presents an intelligent computational method, Target-DBPPred, to improve DBPs prediction. Important features from primary protein sequences are investigated via a novel feature descriptor, called EDF-PSSM-DWT (Evolutionary difference formula position-specific scoring matrix-discrete wavelet transform) and several other multi-evolutionary methods, including F-PSSM (Filtered position-specific scoring matrix), EDF-PSSM (Evolutionary difference formula position-specific scoring matrix), PSSM-DPC (Position-specific scoring matrix-dipeptide composition), and Lead-BiPSSM (Lead-bigram-position specific scoring matrix) to encapsulate diverse multivariate features. The best feature set from the features of each descriptor is selected using sequential forward selection (SFS). Further, four models are trained using Adaboost, XGB (eXtreme gradient boosting), ERT (extremely randomized trees), and LiXGB (Light eXtreme gradient boosting) classifiers. LiXGB, with the best feature set of EDF-PSSM-DWT, has attained 6.69% and 15.07% higher performance in terms of accuracies using training and testing datasets, respectively. The obtained results verify the improved performance of our proposed predictor over the existing predictors. Graphical abstract: Image 1 Highlights: Designed a novel predictor named Target-DBPPred for prediction of DNA-binding proteins. The features are explored by EDF-PSSM-DWT, F-PSSM, PSSM-DPC, and Lead-BiPSSM. The classification is performed by LiXGB, XGB, and ERT. Target-DBPPred has secured the highest prediction results. … (more)
- Is Part Of:
- Computers in biology and medicine. Volume 145(2022)
- Journal:
- Computers in biology and medicine
- Issue:
- Volume 145(2022)
- Issue Display:
- Volume 145, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 145
- Issue:
- 2022
- Issue Sort Value:
- 2022-0145-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-06
- Subjects:
- DNA-Binding proteins -- eXtreme gradient boosting -- Position-specific scoring matrix -- Discrete wavelet transform
Medicine -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
610.285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00104825/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiomed.2022.105533 ↗
- Languages:
- English
- ISSNs:
- 0010-4825
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.880000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 21547.xml