DeepAdd: Protein function prediction from k-mer embedding and additional features. (December 2020)
- Record Type:
- Journal Article
- Title:
- DeepAdd: Protein function prediction from k-mer embedding and additional features. (December 2020)
- Main Title:
- DeepAdd: Protein function prediction from k-mer embedding and additional features
- Authors:
- Du, Zhihua
He, Yufeng
Li, Jianqiang
Uversky, Vladimir N. - Abstract:
- Graphical abstract: DeepAdd consists of two CNN models with multiple convolution blocks that map the presented protein sequence to two-feature vectors representation. One feature representation is for sequence similarity profile by SSP model. The other feature representation is the PPI network by PPI model. DeepAdd uses a hierarchical classification method to classify all candidate GO terms of each query protein. Highlights: DeepAdd is proposed to predict protein functions using a deep convolutional neural network (CNN) framework. DeepAdd utilizes a Word2Vec method on defining the set of features to represent a protein. DeepAdd consists of two CNN models with multiple convolution blocks that map the presented protein sequence to two-feature vectors representation. One feature representation is for the sequence similarity profile by SSP model. The other feature representation is the PPI network by PPI model. Abstract: With the application of new high throughput sequencing technology, a large number of protein sequences is becoming available. Determination of the functional characteristics of these proteins by experiments is an expensive endeavor that requires a lot of time. Furthermore, at the organismal level, such kind of experimental functional analyses can be conducted only for a very few selected model organisms. Computational function prediction methods can be used to fill this gap. The functions of proteins are classified by Gene Ontology (GO), which contains more thanGraphical abstract: DeepAdd consists of two CNN models with multiple convolution blocks that map the presented protein sequence to two-feature vectors representation. One feature representation is for sequence similarity profile by SSP model. The other feature representation is the PPI network by PPI model. DeepAdd uses a hierarchical classification method to classify all candidate GO terms of each query protein. Highlights: DeepAdd is proposed to predict protein functions using a deep convolutional neural network (CNN) framework. DeepAdd utilizes a Word2Vec method on defining the set of features to represent a protein. DeepAdd consists of two CNN models with multiple convolution blocks that map the presented protein sequence to two-feature vectors representation. One feature representation is for the sequence similarity profile by SSP model. The other feature representation is the PPI network by PPI model. Abstract: With the application of new high throughput sequencing technology, a large number of protein sequences is becoming available. Determination of the functional characteristics of these proteins by experiments is an expensive endeavor that requires a lot of time. Furthermore, at the organismal level, such kind of experimental functional analyses can be conducted only for a very few selected model organisms. Computational function prediction methods can be used to fill this gap. The functions of proteins are classified by Gene Ontology (GO), which contains more than 40, 000 classifications in three domains, Molecular Function (MF), Biological Process (BP), and Cellular Component (CC). Additionally, since proteins have many functions, function prediction represents a multi-label and multi-class problem. We developed a new method to predict protein function from sequence. To this end, natural language model was used to generate word embedding of sequence and learn features from it by deep learning, and additional features to locate every protein. Our method uses the dependencies between GO classes as background information to construct a deep learning model. We evaluate our method using the standards established by the Computational Assessment of Function Annotation (CAFA) and have noticeable improvement over several algorithms, such as FFPred, DeepGO, GoFDR and other methods compared on the CAFA3 datasets. … (more)
- Is Part Of:
- Computational biology and chemistry. Volume 89(2020)
- Journal:
- Computational biology and chemistry
- Issue:
- Volume 89(2020)
- Issue Display:
- Volume 89, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 89
- Issue:
- 2020
- Issue Sort Value:
- 2020-0089-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-12
- Subjects:
- Protein function prediction -- Convolution neural network -- Natural language process -- Protein-protein interaction network -- Sequence similarity profile
Chemistry -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
Biochemistry -- Data processing
Biology -- Data processing
Molecular biology -- Data processing
Periodicals
Electronic journals
542.85 - Journal URLs:
- http://www.sciencedirect.com/science/journal/14769271 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiolchem.2020.107379 ↗
- Languages:
- English
- ISSNs:
- 1476-9271
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3390.576700
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 15192.xml