Disease classification for whole-blood DNA methylation: Meta-analysis, missing values imputation, and XAI. (19th October 2022)
- Record Type:
- Journal Article
- Title:
- Disease classification for whole-blood DNA methylation: Meta-analysis, missing values imputation, and XAI. (19th October 2022)
- Main Title:
- Disease classification for whole-blood DNA methylation: Meta-analysis, missing values imputation, and XAI
- Authors:
- Kalyakulina, Alena
Yusipov, Igor
Bacalini, Maria Giulia
Franceschi, Claudio
Vedunova, Maria
Ivanchenko, Mikhail - Abstract:
- Abstract: Background: DNA methylation has a significant effect on gene expression and can be associated with various diseases. Meta-analysis of available DNA methylation datasets requires development of a specific workflow for joint data processing. Results: We propose a comprehensive approach of combined DNA methylation datasets to classify controls and patients. The solution includes data harmonization, construction of machine learning classification models, dimensionality reduction of models, imputation of missing values, and explanation of model predictions by explainable artificial intelligence (XAI) algorithms. We show that harmonization can improve classification accuracy by up to 20% when preprocessing methods of the training and test datasets are different. The best accuracy results were obtained with tree ensembles, reaching above 95% for Parkinson's disease. Dimensionality reduction can substantially decrease the number of features, without detriment to the classification accuracy. The best imputation methods achieve almost the same classification accuracy for data with missing values as for the original data. XAI approaches have allowed us to explain model predictions from both populational and individual perspectives. Conclusions: We propose a methodologically valid and comprehensive approach to the classification of healthy individuals and patients with various diseases based on whole-blood DNA methylation data using Parkinson's disease and schizophrenia asAbstract: Background: DNA methylation has a significant effect on gene expression and can be associated with various diseases. Meta-analysis of available DNA methylation datasets requires development of a specific workflow for joint data processing. Results: We propose a comprehensive approach of combined DNA methylation datasets to classify controls and patients. The solution includes data harmonization, construction of machine learning classification models, dimensionality reduction of models, imputation of missing values, and explanation of model predictions by explainable artificial intelligence (XAI) algorithms. We show that harmonization can improve classification accuracy by up to 20% when preprocessing methods of the training and test datasets are different. The best accuracy results were obtained with tree ensembles, reaching above 95% for Parkinson's disease. Dimensionality reduction can substantially decrease the number of features, without detriment to the classification accuracy. The best imputation methods achieve almost the same classification accuracy for data with missing values as for the original data. XAI approaches have allowed us to explain model predictions from both populational and individual perspectives. Conclusions: We propose a methodologically valid and comprehensive approach to the classification of healthy individuals and patients with various diseases based on whole-blood DNA methylation data using Parkinson's disease and schizophrenia as examples. The proposed algorithm works better for the former pathology, characterized by a complex set of symptoms. It allows to solve data harmonization problems for meta-analysis of many different datasets, impute missing values, and build classification models of small dimensionality. … (more)
- Is Part Of:
- GigaScience. Volume 11(2022)
- Journal:
- GigaScience
- Issue:
- Volume 11(2022)
- Issue Display:
- Volume 11, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 11
- Issue:
- 2022
- Issue Sort Value:
- 2022-0011-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-10-19
- Subjects:
- DNA methylation -- machine learning -- data harmonization -- explainable artificial intelligence
Information storage and retrieval systems -- Research -- Periodicals
Biology -- Research -- Periodicals
Medical sciences -- Research -- Periodicals
Database management -- Periodicals
570.285 - Journal URLs:
- http://www.gigasciencejournal.com/ ↗
http://www.oxfordjournals.org/ ↗ - DOI:
- 10.1093/gigascience/giac097 ↗
- Languages:
- English
- ISSNs:
- 2047-217X
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 24097.xml