PredForm-Site: Formylation site prediction by incorporating multiple features and resolving data imbalance. (October 2021)
- Record Type:
- Journal Article
- Title:
- PredForm-Site: Formylation site prediction by incorporating multiple features and resolving data imbalance. (October 2021)
- Main Title:
- PredForm-Site: Formylation site prediction by incorporating multiple features and resolving data imbalance
- Authors:
- Islam, Md Khaled Ben
Rahman, Julia
Hasan, Md. Al Mehedi
Ahmad, Shamim - Abstract:
- Graphical abstract: Highlights: Representative feature development: Integrating multiple elementary biological sequence encoding techniques for developing a more informative representation of target formylation sites. Imbalance handling: Optimized the decision function of the underlying learning algorithm for handling the imbalance of formylation and non-formylation sites for prediction quality improvement. Choice of classifier and model development: Investigating on multiple diverse yet powerful classifiers, developed a novel formylation site prediction tool, named predForm-Site, with higher predictive accuracy compared to the existing state-of-the-art formylation site prediction methods. Web server development: A ready-to-go server deployed at http://103.99.176.239:8080/predForm-Site for fast exploration of formylation sites. Abstract: Formylation is one of the newly discovered post-translational modifications in lysine residue which is responsible for different kinds of diseases. In this work, a novel predictor, named predForm-Site, has been developed to predict formylation sites with higher accuracy. We have integrated multiple sequence features for developing a more informative representation of formylation sites. Moreover, decision function of the underlying classifier have been optimized on skewed formylation dataset during prediction model training for prediction quality improvement. On the dataset used by LFPred and Formator predictor, predForm-Site achieved 99.5%Graphical abstract: Highlights: Representative feature development: Integrating multiple elementary biological sequence encoding techniques for developing a more informative representation of target formylation sites. Imbalance handling: Optimized the decision function of the underlying learning algorithm for handling the imbalance of formylation and non-formylation sites for prediction quality improvement. Choice of classifier and model development: Investigating on multiple diverse yet powerful classifiers, developed a novel formylation site prediction tool, named predForm-Site, with higher predictive accuracy compared to the existing state-of-the-art formylation site prediction methods. Web server development: A ready-to-go server deployed at http://103.99.176.239:8080/predForm-Site for fast exploration of formylation sites. Abstract: Formylation is one of the newly discovered post-translational modifications in lysine residue which is responsible for different kinds of diseases. In this work, a novel predictor, named predForm-Site, has been developed to predict formylation sites with higher accuracy. We have integrated multiple sequence features for developing a more informative representation of formylation sites. Moreover, decision function of the underlying classifier have been optimized on skewed formylation dataset during prediction model training for prediction quality improvement. On the dataset used by LFPred and Formator predictor, predForm-Site achieved 99.5% sensitivity, 99.8% specificity and 99.8% overall accuracy with AUC of 0.999 in the jackknife test. In the independent test, it has also achieved more than 97% sensitivity and 99% specificity. Similarly, in benchmarking with recent method CKSAAP_FormSite, the proposed predictor significantly outperformed in all the measures, particularly sensitivity by around 20%, specificity by nearly 30% and overall accuracy by more than 22%. These experimental results show that the proposed predForm-Site can be used as a complementary tool for the fast exploration of formylation sites. For convenience of the scientific community, predForm-Site has been deployed as an online tool, accessible at http://103.99.176.239:8080/predForm-Site . … (more)
- Is Part Of:
- Computational biology and chemistry. Volume 94(2021)
- Journal:
- Computational biology and chemistry
- Issue:
- Volume 94(2021)
- Issue Display:
- Volume 94, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 94
- Issue:
- 2021
- Issue Sort Value:
- 2021-0094-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-10
- Subjects:
- Lysine formylation sites prediction -- Feature integration -- Data imbalance issue -- Support vector machine
Chemistry -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
Biochemistry -- Data processing
Biology -- Data processing
Molecular biology -- Data processing
Periodicals
Electronic journals
542.85 - Journal URLs:
- http://www.sciencedirect.com/science/journal/14769271 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiolchem.2021.107553 ↗
- Languages:
- English
- ISSNs:
- 1476-9271
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3390.576700
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 19590.xml