A deep learning ensemble for function prediction of hypothetical proteins from pathogenic bacterial species. (December 2019)
- Record Type:
- Journal Article
- Title:
- A deep learning ensemble for function prediction of hypothetical proteins from pathogenic bacterial species. (December 2019)
- Main Title:
- A deep learning ensemble for function prediction of hypothetical proteins from pathogenic bacterial species
- Authors:
- Mishra, Sarthak
Rastogi, Yash Pratap
Jabin, Suraiya
Kaur, Punit
Amir, Mohammad
Khatun, Shabnam - Abstract:
- Highlights: A deep learning ensemble for protein function prediction of 9 bacterial phyla into multi-class and multi-valued labels. A novel method towards predicting the molecular function of bacterial species from dataset generation to classification. A bacterial phyla dataset with sequence, physicochemical, annotation, and sub-sequence based features. This dataset has 9890 features and 1739 GO terms as multi-class labels for protein function prediction of the bacterial protein sequences. It is a unique method for protein function prediction of bacterial species and can be extended for other species as well. Abstract: Protein function prediction is a crucial task in the post-genomics era due to their diverse irreplaceable roles in a biological system. Traditional methods involved cost-intensive and time-consuming molecular biology techniques but they proved to be ineffective after the outburst of sequencing data through the advent of cost-effective and advanced sequencing techniques. To manage the pace of annotation with that of data generation, there is a shift to computational approaches which are based on homology, sequence and structure-based features, protein-protein interaction networks, phylogenetic profiles, and physicochemical properties, etc. A combination of these features has proven to be promising for protein function prediction in terms of improving prediction accuracy. In the present work, we have employed a combination of features based on sequence,Highlights: A deep learning ensemble for protein function prediction of 9 bacterial phyla into multi-class and multi-valued labels. A novel method towards predicting the molecular function of bacterial species from dataset generation to classification. A bacterial phyla dataset with sequence, physicochemical, annotation, and sub-sequence based features. This dataset has 9890 features and 1739 GO terms as multi-class labels for protein function prediction of the bacterial protein sequences. It is a unique method for protein function prediction of bacterial species and can be extended for other species as well. Abstract: Protein function prediction is a crucial task in the post-genomics era due to their diverse irreplaceable roles in a biological system. Traditional methods involved cost-intensive and time-consuming molecular biology techniques but they proved to be ineffective after the outburst of sequencing data through the advent of cost-effective and advanced sequencing techniques. To manage the pace of annotation with that of data generation, there is a shift to computational approaches which are based on homology, sequence and structure-based features, protein-protein interaction networks, phylogenetic profiles, and physicochemical properties, etc. A combination of these features has proven to be promising for protein function prediction in terms of improving prediction accuracy. In the present work, we have employed a combination of features based on sequence, physicochemical property, subsequence and annotation features with a total of 9890 features extracted and/or calculated for 171, 212 reviewed prokaryotic proteins of 9 bacterial phyla from UniProtKB, to train a supervised deep learning ensemble model with the aim to categorize a bacterial hypothetical/unreviewed protein's function into 1739 GO terms as functional classes. The proposed system being fully dedicated to bacterial organisms is a novel attempt amongst various existing machine learning based protein function prediction systems based on mixed organisms. Experimental results demonstrate the success of the proposed deep learning ensemble model based on deep neural network method with F1 measure of 0.7912 on the prepared Test dataset 1 of reviewed proteins. … (more)
- Is Part Of:
- Computational biology and chemistry. Volume 83(2019)
- Journal:
- Computational biology and chemistry
- Issue:
- Volume 83(2019)
- Issue Display:
- Volume 83, Issue 2019 (2019)
- Year:
- 2019
- Volume:
- 83
- Issue:
- 2019
- Issue Sort Value:
- 2019-0083-2019-0000
- Page Start:
- Page End:
- Publication Date:
- 2019-12
- Subjects:
- Hypothetical proteins -- Function prediction -- Molecular function -- Deep learning -- Reviewed protein -- Motif -- Physicochemical feature -- LeakyRelu -- Nadam -- Deep neural network -- Sequence based feature -- Annotation based feature
Chemistry -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
Biochemistry -- Data processing
Biology -- Data processing
Molecular biology -- Data processing
Periodicals
Electronic journals
542.85 - Journal URLs:
- http://www.sciencedirect.com/science/journal/14769271 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiolchem.2019.107147 ↗
- Languages:
- English
- ISSNs:
- 1476-9271
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3390.576700
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 23133.xml