Computational prediction of protein ubiquitination sites mapping on Arabidopsis thaliana. (April 2020)
- Record Type:
- Journal Article
- Title:
- Computational prediction of protein ubiquitination sites mapping on Arabidopsis thaliana. (April 2020)
- Main Title:
- Computational prediction of protein ubiquitination sites mapping on Arabidopsis thaliana
- Authors:
- Mosharaf, Md. Parvez
Hassan, Md. Mehedi
Ahmed, Fee Faysal
Khatun, Mst. Shamima
Moni, Mohammad Ali
Mollah, Md. Nurul Haque - Abstract:
- Graphical abstract: The working flowchart of the proposed in silico prediction method is represented here. The various sample ratios were considered to train the supervised learning algorithm. To evaluate the model performance cross validation test and independent test dataset were used. Highlights: Ubiquitination sites prediction. CKSAAP encoding scheme. Random Forest Classification. Arabidopsis thaliana . Abstract: Among the protein post-translational modifications (PTMs), ubiquitination is considered as one of the most significant processes which can regulate the cellular functions and various diseases. Identification of ubiquitination sites becomes important for understanding the mechanisms of ubiquitination-related biological processes. Both experimental and computational approaches are available for identifying ubiquitination sites based on protein sequences of different species. The experimental approaches are time-consuming, laborious and costly. In silico prediction is an alternative time saving, easier and cost-effective approach for identifying ubiquitination sites. Moreover, the sequence patterns in the different species around the ubiquitination sites are not similar which demands species-specific predictors. Therefore, in this study, we have proposed a novel computational method for identifying ubiquitination sites based on protein sequences of A. thaliana species which will be robust against outlying observations also. Through the comparative study of twoGraphical abstract: The working flowchart of the proposed in silico prediction method is represented here. The various sample ratios were considered to train the supervised learning algorithm. To evaluate the model performance cross validation test and independent test dataset were used. Highlights: Ubiquitination sites prediction. CKSAAP encoding scheme. Random Forest Classification. Arabidopsis thaliana . Abstract: Among the protein post-translational modifications (PTMs), ubiquitination is considered as one of the most significant processes which can regulate the cellular functions and various diseases. Identification of ubiquitination sites becomes important for understanding the mechanisms of ubiquitination-related biological processes. Both experimental and computational approaches are available for identifying ubiquitination sites based on protein sequences of different species. The experimental approaches are time-consuming, laborious and costly. In silico prediction is an alternative time saving, easier and cost-effective approach for identifying ubiquitination sites. Moreover, the sequence patterns in the different species around the ubiquitination sites are not similar which demands species-specific predictors. Therefore, in this study, we have proposed a novel computational method for identifying ubiquitination sites based on protein sequences of A. thaliana species which will be robust against outlying observations also. Through the comparative study of two encoding schemes and three classifiers, the random forest (RF) based predictor was selected as the best predictor under the CKSAAP encoding scheme with 1:1 ratio of positive and negative samples (i.e. ubiquitinated and non-ubiquitinated) in training dataset. The proposed predictor produced the area under the ROC curve (AUC score) as 0.91 and 0.86 for 5-fold cross-validation test with the training dataset and the independent test dataset of A. thaliana respectively. The proposed RF based predictor also performed much better than the other existing ubiquitination sites predictors for A. thaliana . … (more)
- Is Part Of:
- Computational biology and chemistry. Volume 85(2020)
- Journal:
- Computational biology and chemistry
- Issue:
- Volume 85(2020)
- Issue Display:
- Volume 85, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 85
- Issue:
- 2020
- Issue Sort Value:
- 2020-0085-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-04
- Subjects:
- Arabidopsis thaliana species -- Protein sequences -- Ubiquitination sites -- CKSAAP encoding -- Random forest
Chemistry -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
Biochemistry -- Data processing
Biology -- Data processing
Molecular biology -- Data processing
Periodicals
Electronic journals
542.85 - Journal URLs:
- http://www.sciencedirect.com/science/journal/14769271 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiolchem.2020.107238 ↗
- Languages:
- English
- ISSNs:
- 1476-9271
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3390.576700
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 13458.xml