Explainable artificial intelligence model for identifying COVID-19 gene biomarkers. (March 2023)
- Record Type:
- Journal Article
- Title:
- Explainable artificial intelligence model for identifying COVID-19 gene biomarkers. (March 2023)
- Main Title:
- Explainable artificial intelligence model for identifying COVID-19 gene biomarkers
- Authors:
- Yagin, Fatma Hilal
Cicek, İpek Balikci
Alkhateeb, Abedalrhman
Yagin, Burak
Colak, Cemil
Azzeh, Mohammad
Akbulut, Sami - Abstract:
- Abstract: Aim: COVID-19 has revealed the need for fast and reliable methods to assist clinicians in diagnosing the disease. This article presents a model that applies explainable artificial intelligence (XAI) methods based on machine learning techniques on COVID-19 metagenomic next-generation sequencing (mNGS) samples. Methods: In the data set used in the study, there are 15, 979 gene expressions of 234 patients with COVID-19 negative 141 (60.3%) and COVID-19 positive 93 (39.7%). The least absolute shrinkage and selection operator (LASSO) method was applied to select genes associated with COVID-19. Support Vector Machine - Synthetic Minority Oversampling Technique (SVM-SMOTE) method was used to handle the class imbalance problem. Logistics regression (LR), SVM, random forest (RF), and extreme gradient boosting (XGBoost) methods were constructed to predict COVID-19. An explainable approach based on local interpretable model-agnostic explanations (LIME) and SHAPley Additive exPlanations (SHAP) methods was applied to determine COVID-19- associated biomarker candidate genes and improve the final model's interpretability. Results: For the diagnosis of COVID-19, the XGBoost (accuracy: 0.930) model outperformed the RF (accuracy: 0.912), SVM (accuracy: 0.877), and LR (accuracy: 0.912) models. As a result of the SHAP, the three most important genes associated with COVID-19 were IFI27, LGR6, and FAM83A. The results of LIME showed that especially the high level of IFI27 gene expressionAbstract: Aim: COVID-19 has revealed the need for fast and reliable methods to assist clinicians in diagnosing the disease. This article presents a model that applies explainable artificial intelligence (XAI) methods based on machine learning techniques on COVID-19 metagenomic next-generation sequencing (mNGS) samples. Methods: In the data set used in the study, there are 15, 979 gene expressions of 234 patients with COVID-19 negative 141 (60.3%) and COVID-19 positive 93 (39.7%). The least absolute shrinkage and selection operator (LASSO) method was applied to select genes associated with COVID-19. Support Vector Machine - Synthetic Minority Oversampling Technique (SVM-SMOTE) method was used to handle the class imbalance problem. Logistics regression (LR), SVM, random forest (RF), and extreme gradient boosting (XGBoost) methods were constructed to predict COVID-19. An explainable approach based on local interpretable model-agnostic explanations (LIME) and SHAPley Additive exPlanations (SHAP) methods was applied to determine COVID-19- associated biomarker candidate genes and improve the final model's interpretability. Results: For the diagnosis of COVID-19, the XGBoost (accuracy: 0.930) model outperformed the RF (accuracy: 0.912), SVM (accuracy: 0.877), and LR (accuracy: 0.912) models. As a result of the SHAP, the three most important genes associated with COVID-19 were IFI27, LGR6, and FAM83A. The results of LIME showed that especially the high level of IFI27 gene expression contributed to increasing the probability of positive class. Conclusions: The proposed model (XGBoost) was able to predict COVID-19 successfully. The results show that machine learning combined with LIME and SHAP can explain the biomarker prediction for COVID-19 and provide clinicians with an intuitive understanding and interpretability of the impact of risk factors in the model. Highlights: Explainable AI approach for COVID-19 diagnosis based on next-generation sequencing. XGBoost is a robust method for detecting COVID-19. LIME provides local explanations of the relative importance of each gene. SHAP provides global explanations for the genes that cause COVID-19. Predictive qualities in the detection of genomic biomarkers of COVID-19 are presented. … (more)
- Is Part Of:
- Computers in biology and medicine. Volume 154(2023)
- Journal:
- Computers in biology and medicine
- Issue:
- Volume 154(2023)
- Issue Display:
- Volume 154, Issue 2023 (2023)
- Year:
- 2023
- Volume:
- 154
- Issue:
- 2023
- Issue Sort Value:
- 2023-0154-2023-0000
- Page Start:
- Page End:
- Publication Date:
- 2023-03
- Subjects:
- COVID-19 -- Explainable artificial intelligence -- LIME -- SHAP -- XGBoost
Medicine -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
610.285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00104825/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiomed.2023.106619 ↗
- Languages:
- English
- ISSNs:
- 0010-4825
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.880000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 25961.xml