ActTRANS: Functional classification in active transport proteins based on transfer learning and contextual representations. (August 2021)
- Record Type:
- Journal Article
- Title:
- ActTRANS: Functional classification in active transport proteins based on transfer learning and contextual representations. (August 2021)
- Main Title:
- ActTRANS: Functional classification in active transport proteins based on transfer learning and contextual representations
- Authors:
- Taju, Semmy Wellem
Shah, Syed Muazzam Ali
Ou, Yu-Yen - Abstract:
- Graphical abstract: Highlights: Active transport mechanisms regulate ions or small molecules across a cell membrane. Primary and secondary active transport utilize energy to move the substances. Proposed Support Vector Machine and Bidirectional Encoder Representations from Transformers embeddings to represent proteins. Capture multiple meanings for the same amino acid to reveal the importance of specific residues. Developed a method to classify proteins to a type of active transporters using pre-trained contextual embeddings. Abstract: Motivation: Primary and secondary active transport are two types of active transport that involve using energy to move the substances. Active transport mechanisms do use proteins to assist in transport and play essential roles to regulate the traffic of ions or small molecules across a cell membrane against the concentration gradient. In this study, the two main types of proteins involved in such transport are classified from transmembrane transport proteins. We propose a Support Vector Machine (SVM) with contextualized word embeddings from Bidirectional Encoder Representations from Transformers (BERT) to represent protein sequences. BERT is a powerful model in transfer learning, a deep learning language representation model developed by Google and one of the highest performing pre-trained model for Natural Language Processing (NLP) tasks. The idea of transfer learning with pre-trained model from BERT is applied to extract fixed featureGraphical abstract: Highlights: Active transport mechanisms regulate ions or small molecules across a cell membrane. Primary and secondary active transport utilize energy to move the substances. Proposed Support Vector Machine and Bidirectional Encoder Representations from Transformers embeddings to represent proteins. Capture multiple meanings for the same amino acid to reveal the importance of specific residues. Developed a method to classify proteins to a type of active transporters using pre-trained contextual embeddings. Abstract: Motivation: Primary and secondary active transport are two types of active transport that involve using energy to move the substances. Active transport mechanisms do use proteins to assist in transport and play essential roles to regulate the traffic of ions or small molecules across a cell membrane against the concentration gradient. In this study, the two main types of proteins involved in such transport are classified from transmembrane transport proteins. We propose a Support Vector Machine (SVM) with contextualized word embeddings from Bidirectional Encoder Representations from Transformers (BERT) to represent protein sequences. BERT is a powerful model in transfer learning, a deep learning language representation model developed by Google and one of the highest performing pre-trained model for Natural Language Processing (NLP) tasks. The idea of transfer learning with pre-trained model from BERT is applied to extract fixed feature vectors from the hidden layers and learn contextual relations between amino acids in the protein sequence. Therefore, the contextualized word representations of proteins are introduced to effectively model complex structures of amino acids in the sequence and the variations of these amino acids in the context. By generating context information, we capture multiple meanings for the same amino acid to reveal the importance of specific residues in the protein sequence. Results: The performance of the proposed method is evaluated using five-fold cross-validation and independent test. The proposed method achieves an accuracy of 85.44 %, 88.74 % and 92.84 % for Class-1, Class-2, and Class-3, respectively. Experimental results show that this approach can outperform from other feature extraction methods using context information, effectively classify two types of active transport and improve the overall performance. … (more)
- Is Part Of:
- Computational biology and chemistry. Volume 93(2021)
- Journal:
- Computational biology and chemistry
- Issue:
- Volume 93(2021)
- Issue Display:
- Volume 93, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 93
- Issue:
- 2021
- Issue Sort Value:
- 2021-0093-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-08
- Subjects:
- Contextual representations -- Contextualized word embeddings -- Active transport -- Primary active transport -- Secondary active transport -- Membrane proteins
Chemistry -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
Biochemistry -- Data processing
Biology -- Data processing
Molecular biology -- Data processing
Periodicals
Electronic journals
542.85 - Journal URLs:
- http://www.sciencedirect.com/science/journal/14769271 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiolchem.2021.107537 ↗
- Languages:
- English
- ISSNs:
- 1476-9271
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3390.576700
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 17800.xml