Optimal prediction of viral host from genomic datasets using ensemble classifier. (January 2023)
- Record Type:
- Journal Article
- Title:
- Optimal prediction of viral host from genomic datasets using ensemble classifier. (January 2023)
- Main Title:
- Optimal prediction of viral host from genomic datasets using ensemble classifier
- Authors:
- Kathavate, Pravin Narayan
- Abstract:
- Highlights: The highlights of the article are given below for your kind perusal. Kindly, consider and forward my article for further processes. In this research work, a novel viral host prediction model from genomic datasets was introduced with the following three major phases: (a) Pre-processing, (b) Feature extraction and (c) Prediction phase. Initially, the collected raw genomic datasets was subjected to pre-processing, where the data cleaning operations was undergone. Then, the features like the statistical features, high order statistical features, weighted holoentropy, chi-squared features, relief based features, symmetric uncertainty based features was extracted from the pre-processed data. Then, the ensemble technique was used for the prediction, which includes the "SVM, NN, RF and optimized CNN", respectively. Here, the extracted features were fed as input to the SVM, NN and RF classifiers. The resultant from these classifiers was as the input to optimized CNN, which provides the final results Moreover, with the objective of enhancing the prediction accuracy of CNN, its weights will be fine-tuned using AFPA, which be an improved version of standard FPA. On observing the accuracy value, the AFPA+EC had exhibit the highest value, and this in turn clearly says that the AFPA+EC is sufficient for accurately predicting the host of the virus. The proposed model is evaluated in terms of "specificity, sensitivity, accuracy, and precision, FPR, FNR, NPV, FDR, F1-Score andHighlights: The highlights of the article are given below for your kind perusal. Kindly, consider and forward my article for further processes. In this research work, a novel viral host prediction model from genomic datasets was introduced with the following three major phases: (a) Pre-processing, (b) Feature extraction and (c) Prediction phase. Initially, the collected raw genomic datasets was subjected to pre-processing, where the data cleaning operations was undergone. Then, the features like the statistical features, high order statistical features, weighted holoentropy, chi-squared features, relief based features, symmetric uncertainty based features was extracted from the pre-processed data. Then, the ensemble technique was used for the prediction, which includes the "SVM, NN, RF and optimized CNN", respectively. Here, the extracted features were fed as input to the SVM, NN and RF classifiers. The resultant from these classifiers was as the input to optimized CNN, which provides the final results Moreover, with the objective of enhancing the prediction accuracy of CNN, its weights will be fine-tuned using AFPA, which be an improved version of standard FPA. On observing the accuracy value, the AFPA+EC had exhibit the highest value, and this in turn clearly says that the AFPA+EC is sufficient for accurately predicting the host of the virus. The proposed model is evaluated in terms of "specificity, sensitivity, accuracy, and precision, FPR, FNR, NPV, FDR, F1-Score and MCC", respectively. Abstract: Viruses are common biological agents that are supposed to be the world's greatest repositories of undiscovered genetic diversity. One of the common problems in bioinformatics is gene-disease prediction. Techniques for taxonomic classification, host range, and biological properties of newly discovered viruses are needed for complete functional characterization and annotation. Understanding the behaviors as well as interactions of microbial populations needs research into virus-host infectious associations. The following three main steps of an unique viral host prediction method using genomic datasets are introduced in this research work: "(a) Pre-processing, (b) Feature extraction, and (c) Prediction phase". In starting stage, raw genomic datasets are exposed to pre-processing, which would include data cleaning activities. The pre-processed data is then used to extract the statistical features, high order statistical features, weighted holoentropy, chi-squared features, relief-based features, and symmetric uncertainty-based features. The ensemble approach, which uses the Support Vector Machine (SVM), Neural Network (NN), Random Forest (RF), and Convolutional Neural Network (CNN), respectively, is then employed for the prediction. Here, SVM, NN, and RF classifiers are fed the retrieved features as input. These classifiers' outputs will be fed into an optimised CNN, which produces the final prediction outcome. Additionally, the Adaptive Flower Pollination Algorithm (AFPA), an upgraded variant of the conventional Flower Pollination Protocol, is used to fine-tune the weights of optimised CNN in order to increase prediction accuracy (FPA). The AFPA+EC's F1-Score is 0.75026, which is correspondingly 61%, 52.4%, 20.2%, and 26.5% better than the conventional methods SVM, KNN, RF, and CNN. … (more)
- Is Part Of:
- Advances in engineering software. Volume 175(2023)
- Journal:
- Advances in engineering software
- Issue:
- Volume 175(2023)
- Issue Display:
- Volume 175, Issue 2023 (2023)
- Year:
- 2023
- Volume:
- 175
- Issue:
- 2023
- Issue Sort Value:
- 2023-0175-2023-0000
- Page Start:
- Page End:
- Publication Date:
- 2023-01
- Subjects:
- Virus-Host Prediction -- Statistical -- Higher order Statistical Feature Extraction -- Adaptive Flower Pollination Algorithm -- Ensemble Classifier
Computer-aided engineering -- Periodicals
Engineering -- Computer programs -- Periodicals
Engineering -- Software -- Periodicals
Periodicals
620.0028553 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09659978 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.advengsoft.2022.103273 ↗
- Languages:
- English
- ISSNs:
- 0965-9978
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 0705.450000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 24451.xml