ABLE: Attention based learning for enzyme classification. (October 2021)
- Record Type:
- Journal Article
- Title:
- ABLE: Attention based learning for enzyme classification. (October 2021)
- Main Title:
- ABLE: Attention based learning for enzyme classification
- Authors:
- Nallapareddy, Mohan Vamsi
Dwivedula, Rohit - Abstract:
- Abstract: Classifying proteins into their respective enzyme class is an interesting question for researchers for a variety of reasons. The open source Protein Data Bank (PDB) contains more than 1, 60, 000 structures, with more being added everyday. This paper proposes an attention-based bidirectional-LSTM model (ABLE) trained on over sampled data generated by SMOTE to analyse and classify a protein into one of the six enzyme classes or a negative class using only the primary structure of the protein described as a string by the FASTA sequence as an input. We achieve the highest F1-score of 0.834 using our proposed model on a dataset of proteins from the PDB. We baseline our model against eighteen other machine learning and deep learning networks, including CNN, LSTM, Bi-LSTM, GRU, and the state-of-the-art DeepEC model. We conduct experiments with two different oversampling techniques, SMOTE and ADASYN. To corroborate the obtained results, we perform extensive experimentation and statistical testing. Graphical Abstract: ga1 Highlights: The proposed attention-BiLSTM model for classifying enzymes outperforms vanilla deep learning and machine learning models. Extensive experimentation was conducted through 10-fold cross-validation and comparison against a wide range of baselines. Class imbalance problem in the enzyme dataset was tackled with two data oversampling techniques, SMOTE and ADASYN. Statistical testing was performed to validate our results, instead of relying only onAbstract: Classifying proteins into their respective enzyme class is an interesting question for researchers for a variety of reasons. The open source Protein Data Bank (PDB) contains more than 1, 60, 000 structures, with more being added everyday. This paper proposes an attention-based bidirectional-LSTM model (ABLE) trained on over sampled data generated by SMOTE to analyse and classify a protein into one of the six enzyme classes or a negative class using only the primary structure of the protein described as a string by the FASTA sequence as an input. We achieve the highest F1-score of 0.834 using our proposed model on a dataset of proteins from the PDB. We baseline our model against eighteen other machine learning and deep learning networks, including CNN, LSTM, Bi-LSTM, GRU, and the state-of-the-art DeepEC model. We conduct experiments with two different oversampling techniques, SMOTE and ADASYN. To corroborate the obtained results, we perform extensive experimentation and statistical testing. Graphical Abstract: ga1 Highlights: The proposed attention-BiLSTM model for classifying enzymes outperforms vanilla deep learning and machine learning models. Extensive experimentation was conducted through 10-fold cross-validation and comparison against a wide range of baselines. Class imbalance problem in the enzyme dataset was tackled with two data oversampling techniques, SMOTE and ADASYN. Statistical testing was performed to validate our results, instead of relying only on increases in performance metrics. … (more)
- Is Part Of:
- Computational biology and chemistry. Volume 94(2021)
- Journal:
- Computational biology and chemistry
- Issue:
- Volume 94(2021)
- Issue Display:
- Volume 94, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 94
- Issue:
- 2021
- Issue Sort Value:
- 2021-0094-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-10
- Subjects:
- Attention -- Bidirectional LSTM -- Deep learning -- Enzyme classification
Chemistry -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
Biochemistry -- Data processing
Biology -- Data processing
Molecular biology -- Data processing
Periodicals
Electronic journals
542.85 - Journal URLs:
- http://www.sciencedirect.com/science/journal/14769271 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiolchem.2021.107558 ↗
- Languages:
- English
- ISSNs:
- 1476-9271
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3390.576700
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 19365.xml