SubFeat: Feature subspacing ensemble classifier for function prediction of DNA, RNA and protein sequences. (June 2021)
- Record Type:
- Journal Article
- Title:
- SubFeat: Feature subspacing ensemble classifier for function prediction of DNA, RNA and protein sequences. (June 2021)
- Main Title:
- SubFeat: Feature subspacing ensemble classifier for function prediction of DNA, RNA and protein sequences
- Authors:
- Haque, H.M.Fazlul
Rafsanjani, Muhammod
Arifin, Fariha
Adilina, Sheikh
Shatabda, Swakkhar - Abstract:
- Graphical abstract: Highlights: Python based package. Ensemble method for classification Handling DNA, RNA and protein sequences. Feature subspace method. Abstract: The information of a cell is primarily contained in deoxyribonucleic acid (DNA). There is a flow of DNA information to protein sequences via ribonucleic acids (RNA) through transcription and translation. These entities are vital for the genetic process. Recent epigenetics developments also show the importance of the genetic material and knowledge of their attributes and functions. However, the growth in these entities' available features or functionalities is still slow due to the time-consuming and expensive in vitro experimental methods. In this paper, we have proposed an ensemble classification algorithm called SubFeat to predict biological entities' functionalities from different types of datasets. Our model uses a feature subspace-based novel ensemble method. It divides the feature space into sub-spaces, which are then passed to learn individual classifier models. The ensemble is built on these base classifiers that use a weighted majority voting mechanism. SubFeat tested on four datasets comprising two DNA, one RNA, and one protein dataset, and it outperformed all the existing single classifiers and the ensemble classifiers. SubFeat is made available as a Python-based tool. We have made the package SubFeat available online along with a user manual. It is freely accessible from here:Graphical abstract: Highlights: Python based package. Ensemble method for classification Handling DNA, RNA and protein sequences. Feature subspace method. Abstract: The information of a cell is primarily contained in deoxyribonucleic acid (DNA). There is a flow of DNA information to protein sequences via ribonucleic acids (RNA) through transcription and translation. These entities are vital for the genetic process. Recent epigenetics developments also show the importance of the genetic material and knowledge of their attributes and functions. However, the growth in these entities' available features or functionalities is still slow due to the time-consuming and expensive in vitro experimental methods. In this paper, we have proposed an ensemble classification algorithm called SubFeat to predict biological entities' functionalities from different types of datasets. Our model uses a feature subspace-based novel ensemble method. It divides the feature space into sub-spaces, which are then passed to learn individual classifier models. The ensemble is built on these base classifiers that use a weighted majority voting mechanism. SubFeat tested on four datasets comprising two DNA, one RNA, and one protein dataset, and it outperformed all the existing single classifiers and the ensemble classifiers. SubFeat is made available as a Python-based tool. We have made the package SubFeat available online along with a user manual. It is freely accessible from here: https://github.com/fazlulhaquejony/SubFeat . … (more)
- Is Part Of:
- Computational biology and chemistry. Volume 92(2021)
- Journal:
- Computational biology and chemistry
- Issue:
- Volume 92(2021)
- Issue Display:
- Volume 92, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 92
- Issue:
- 2021
- Issue Sort Value:
- 2021-0092-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-06
- Subjects:
- Feature subspacing -- Ensemble classifier -- Biological entities -- Machine learning -- Classification
Chemistry -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
Biochemistry -- Data processing
Biology -- Data processing
Molecular biology -- Data processing
Periodicals
Electronic journals
542.85 - Journal URLs:
- http://www.sciencedirect.com/science/journal/14769271 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiolchem.2021.107489 ↗
- Languages:
- English
- ISSNs:
- 1476-9271
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3390.576700
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 16987.xml