A new sequence based encoding for prediction of host–pathogen protein interactions. (February 2019)
- Record Type:
- Journal Article
- Title:
- A new sequence based encoding for prediction of host–pathogen protein interactions. (February 2019)
- Main Title:
- A new sequence based encoding for prediction of host–pathogen protein interactions
- Authors:
- Kösesoy, İrfan
Gök, Murat
Öz, Cemil - Abstract:
- Graphical abstract: Highlights: We proposed a novel and robust sequence based feature extraction method to predict pathogen–host interactions. We have applied our method (LBE) and other well known sequence based methods to the Bacillus Anthracis and Yersinia Pestis data sets. We have achieved to increase the accuracy of pathogen–host interaction prediction by using our conjecture that the location of amino acids can be used as a feature to differentiate proteins. Based on the experimental results, one can conclude that our method is more successful than other encoding methods, used in this study, with decision tree (RF and j48) and instance based (kNN) classifiers. Abstract: Pathogen–host interactions are very important to figure out the infection process at the molecular level, where pathogen proteins physically bind to human proteins to manipulate critical biological processes in the host cell. Data scarcity and data unavailability are two major problems for computational approaches in the prediction of pathogen–host interactions. Developing a computational method to predict pathogen–host interactions with high accuracy, based on protein sequences alone, is of great importance because it can eliminate these problems. In this study, we propose a novel and robust sequence based feature extraction method, named Location Based Encoding, to predict pathogen–host interactions with machine learning based algorithms. In this context, we use Bacillus Anthracis and Yersinia PestisGraphical abstract: Highlights: We proposed a novel and robust sequence based feature extraction method to predict pathogen–host interactions. We have applied our method (LBE) and other well known sequence based methods to the Bacillus Anthracis and Yersinia Pestis data sets. We have achieved to increase the accuracy of pathogen–host interaction prediction by using our conjecture that the location of amino acids can be used as a feature to differentiate proteins. Based on the experimental results, one can conclude that our method is more successful than other encoding methods, used in this study, with decision tree (RF and j48) and instance based (kNN) classifiers. Abstract: Pathogen–host interactions are very important to figure out the infection process at the molecular level, where pathogen proteins physically bind to human proteins to manipulate critical biological processes in the host cell. Data scarcity and data unavailability are two major problems for computational approaches in the prediction of pathogen–host interactions. Developing a computational method to predict pathogen–host interactions with high accuracy, based on protein sequences alone, is of great importance because it can eliminate these problems. In this study, we propose a novel and robust sequence based feature extraction method, named Location Based Encoding, to predict pathogen–host interactions with machine learning based algorithms. In this context, we use Bacillus Anthracis and Yersinia Pestis data sets as the pathogen organisms and human proteins as the host model to compare our method with sequence based protein encoding methods, which are widely used in the literature, namely amino acid composition, amino acid pair, and conjoint triad. We use these encoding methods with decision trees (Random Forest, j48), statistical (Bayesian Networks, Naive Bayes), and instance based (kNN) classifiers to predict pathogen–host interactions. We conduct different experiments to evaluate the effectiveness of our method. We obtain the best results among all the experiments with RF classifier in terms of F1, accuracy, MCC, and AUC. … (more)
- Is Part Of:
- Computational biology and chemistry. Volume 78(2019)
- Journal:
- Computational biology and chemistry
- Issue:
- Volume 78(2019)
- Issue Display:
- Volume 78, Issue 2019 (2019)
- Year:
- 2019
- Volume:
- 78
- Issue:
- 2019
- Issue Sort Value:
- 2019-0078-2019-0000
- Page Start:
- 170
- Page End:
- 177
- Publication Date:
- 2019-02
- Subjects:
- Infectious diseases -- Host–pathogen interactions -- Protein–protein interactions -- Protein networks -- Machine learning
Chemistry -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
Biochemistry -- Data processing
Biology -- Data processing
Molecular biology -- Data processing
Periodicals
Electronic journals
542.85 - Journal URLs:
- http://www.sciencedirect.com/science/journal/14769271 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiolchem.2018.12.001 ↗
- Languages:
- English
- ISSNs:
- 1476-9271
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3390.576700
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 11608.xml