TIGER: technical variation elimination for metabolomics data using ensemble learning architecture. Issue 2 (3rd January 2022)
- Record Type:
- Journal Article
- Title:
- TIGER: technical variation elimination for metabolomics data using ensemble learning architecture. Issue 2 (3rd January 2022)
- Main Title:
- TIGER: technical variation elimination for metabolomics data using ensemble learning architecture
- Authors:
- Han, Siyu
Huang, Jialing
Foppiano, Francesco
Prehn, Cornelia
Adamski, Jerzy
Suhre, Karsten
Li, Ying
Matullo, Giuseppe
Schliess, Freimut
Gieger, Christian
Peters, Annette
Wang-Sattler, Rui - Abstract:
- Abstract: Large metabolomics datasets inevitably contain unwanted technical variations which can obscure meaningful biological signals and affect how this information is applied to personalized healthcare. Many methods have been developed to handle unwanted variations. However, the underlying assumptions of many existing methods only hold for a few specific scenarios. Some tools remove technical variations with models trained on quality control (QC) samples which may not generalize well on subject samples. Additionally, almost none of the existing methods supports datasets with multiple types of QC samples, which greatly limits their performance and flexibility. To address these issues, a non-parametric method TIGER (Technical variation elImination with ensemble learninG architEctuRe) is developed in this study and released as an R package (https://CRAN.R-project.org/package=TIGERr ). TIGER integrates the random forest algorithm into an adaptable ensemble learning architecture. Evaluation results show that TIGER outperforms four popular methods with respect to robustness and reliability on three human cohort datasets constructed with targeted or untargeted metabolomics data. Additionally, a case study aiming to identify age-associated metabolites is performed to illustrate how TIGER can be used for cross-kit adjustment in a longitudinal analysis with experimental data of three time-points generated by different analytical kits. A dynamic website is developed to help evaluateAbstract: Large metabolomics datasets inevitably contain unwanted technical variations which can obscure meaningful biological signals and affect how this information is applied to personalized healthcare. Many methods have been developed to handle unwanted variations. However, the underlying assumptions of many existing methods only hold for a few specific scenarios. Some tools remove technical variations with models trained on quality control (QC) samples which may not generalize well on subject samples. Additionally, almost none of the existing methods supports datasets with multiple types of QC samples, which greatly limits their performance and flexibility. To address these issues, a non-parametric method TIGER (Technical variation elImination with ensemble learninG architEctuRe) is developed in this study and released as an R package (https://CRAN.R-project.org/package=TIGERr ). TIGER integrates the random forest algorithm into an adaptable ensemble learning architecture. Evaluation results show that TIGER outperforms four popular methods with respect to robustness and reliability on three human cohort datasets constructed with targeted or untargeted metabolomics data. Additionally, a case study aiming to identify age-associated metabolites is performed to illustrate how TIGER can be used for cross-kit adjustment in a longitudinal analysis with experimental data of three time-points generated by different analytical kits. A dynamic website is developed to help evaluate the performance of TIGER and examine the patterns revealed in our longitudinal analysis (https://han-siyu.github.io/TIGER_web/ ). Overall, TIGER is expected to be a powerful tool for metabolomics data analysis. … (more)
- Is Part Of:
- Briefings in bioinformatics. Volume 23:Issue 2(2022)
- Journal:
- Briefings in bioinformatics
- Issue:
- Volume 23:Issue 2(2022)
- Issue Display:
- Volume 23, Issue 2 (2022)
- Year:
- 2022
- Volume:
- 23
- Issue:
- 2
- Issue Sort Value:
- 2022-0023-0002-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-01-03
- Subjects:
- metabolomics -- machine learning -- ensemble learning -- predictive modelling -- longitudinal analysis
Genetics -- Data processing -- Periodicals
Molecular biology -- Data processing -- Periodicals
Genomes -- Data processing -- Periodicals
572.80285 - Journal URLs:
- http://bib.oxfordjournals.org ↗
http://www.oxfordjournals.org/content?genre=journal&issn=1477-4054 ↗
http://ukcatalogue.oup.com/ ↗
http://firstsearch.oclc.org ↗ - DOI:
- 10.1093/bib/bbab535 ↗
- Languages:
- English
- ISSNs:
- 1467-5463
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 2283.958363
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 20750.xml