Graph convolutional network based virus-human protein-protein interaction prediction for novel viruses. (December 2022)
- Record Type:
- Journal Article
- Title:
- Graph convolutional network based virus-human protein-protein interaction prediction for novel viruses. (December 2022)
- Main Title:
- Graph convolutional network based virus-human protein-protein interaction prediction for novel viruses
- Authors:
- Koca, Mehmet Burak
Nourani, Esmaeil
Abbasoğlu, Ferda
Karadeniz, İlknur
Sevilgen, Fatih Erdoğan - Abstract:
- Abstract: Computational identification of human-virus protein-protein interactions (PHIs) is a worthwhile step towards understanding infection mechanisms. Analysis of the PHI networks is important for the determination of pathogenic diseases. Prediction of these interactions is a popular problem since experimental detection of PHIs is both time-consuming and expensive. The available methods use biological features like amino acid sequences, molecular structure, or biological activities for prediction. Recent studies show that the topological properties of proteins in protein-protein interaction (PPI) networks increase the performance of the predictions. The basic network projections, random-walk-based models, or graph neural networks are used for generating topologically enriched (hybrid) protein embeddings. In this study, we propose a three-stage machine learning pipeline that generates and uses hybrid embeddings for PHI prediction. In the first stage, numerical features are extracted from the amino acid sequences using the Doc2Vec and Byte Pair Encoding method. The amino acid embeddings are used as node features while training a modified GraphSAGE model, which is an improved version of the graph convolutional network. Lastly, the hybrid protein embeddings are used for training a binary interaction classifier model that predicts whether there is an interaction between the given two proteins or not. The proposed method is evaluated with comprehensive experiments to test itsAbstract: Computational identification of human-virus protein-protein interactions (PHIs) is a worthwhile step towards understanding infection mechanisms. Analysis of the PHI networks is important for the determination of pathogenic diseases. Prediction of these interactions is a popular problem since experimental detection of PHIs is both time-consuming and expensive. The available methods use biological features like amino acid sequences, molecular structure, or biological activities for prediction. Recent studies show that the topological properties of proteins in protein-protein interaction (PPI) networks increase the performance of the predictions. The basic network projections, random-walk-based models, or graph neural networks are used for generating topologically enriched (hybrid) protein embeddings. In this study, we propose a three-stage machine learning pipeline that generates and uses hybrid embeddings for PHI prediction. In the first stage, numerical features are extracted from the amino acid sequences using the Doc2Vec and Byte Pair Encoding method. The amino acid embeddings are used as node features while training a modified GraphSAGE model, which is an improved version of the graph convolutional network. Lastly, the hybrid protein embeddings are used for training a binary interaction classifier model that predicts whether there is an interaction between the given two proteins or not. The proposed method is evaluated with comprehensive experiments to test its functionality and compare it with the state-of-art methods. The experimental results on the benchmark dataset prove the efficiency of the proposed model by having a 3–23% better area under curve (AUC) score than its competitors. Graphical Abstract: ga1 Highlights: Many computational studies have investigated protein-protein interactions (PPIs) until now. However, the majority of the previous approaches consider intra-species interactions and not the inter-species interactions between pathogen and host proteins. The main contribution of this study is the novelty of the Graph Convolutional Network (GCN) based architecture to predict virus-human protein-protein interactions. To the best of our knowledge, this is the first study that utilizes graph convolutional networks for PHI prediction. Different viruses, including Sars-Cov-2, are utilized as holdout sets to evaluate the performance of the proposed method for the novel viruses. Predicting interactions for emerging viruses can be challenging due to the lack of available graph structures utilized by GCNs. In this paper, to overcome this challenge, a preliminary phase is introduced to enrich the graph before the GCN training, which leads to a significant improvement. The presented method is compared with the state-of-art methods on benchmark datasets to evaluate its performance. The performance result of the comparative test shows that the proposed method is better than its closest competitor by 1–21% in all the seven evaluation metrics. … (more)
- Is Part Of:
- Computational biology and chemistry. Volume 101(2022)
- Journal:
- Computational biology and chemistry
- Issue:
- Volume 101(2022)
- Issue Display:
- Volume 101, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 101
- Issue:
- 2022
- Issue Sort Value:
- 2022-0101-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-12
- Subjects:
- PHI networks -- Graph convolutional networks -- Protein-protein interaction prediction
Chemistry -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
Biochemistry -- Data processing
Biology -- Data processing
Molecular biology -- Data processing
Periodicals
Electronic journals
542.85 - Journal URLs:
- http://www.sciencedirect.com/science/journal/14769271 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiolchem.2022.107755 ↗
- Languages:
- English
- ISSNs:
- 1476-9271
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3390.576700
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 24382.xml