Toward a gold standard for benchmarking gene set enrichment analysis. Issue 1 (9th March 2020)
- Record Type:
- Journal Article
- Title:
- Toward a gold standard for benchmarking gene set enrichment analysis. Issue 1 (9th March 2020)
- Main Title:
- Toward a gold standard for benchmarking gene set enrichment analysis
- Authors:
- Geistlinger, Ludwig
Csaba, Gergely
Santarelli, Mara
Ramos, Marcel
Schiffer, Lucas
Turaga, Nitesh
Law, Charity
Davis, Sean
Carey, Vincent
Morgan, Martin
Zimmer, Ralf
Waldron, Levi - Abstract:
- Abstract: Motivation: Although gene set enrichment analysis has become an integral part of high-throughput gene expression data analysis, the assessment of enrichment methods remains rudimentary and ad hoc . In the absence of suitable gold standards, evaluations are commonly restricted to selected datasets and biological reasoning on the relevance of resulting enriched gene sets. Results: We develop an extensible framework for reproducible benchmarking of enrichment methods based on defined criteria for applicability, gene set prioritization and detection of relevant processes. This framework incorporates a curated compendium of 75 expression datasets investigating 42 human diseases. The compendium features microarray and RNA-seq measurements, and each dataset is associated with a precompiled GO/KEGG relevance ranking for the corresponding disease under investigation. We perform a comprehensive assessment of 10 major enrichment methods, identifying significant differences in runtime and applicability to RNA-seq data, fraction of enriched gene sets depending on the null hypothesis tested and recovery of the predefined relevance rankings. We make practical recommendations on how methods originally developed for microarray data can efficiently be applied to RNA-seq data, how to interpret results depending on the type of gene set test conducted and which methods are best suited to effectively prioritize gene sets with high phenotype relevance. Availability:Abstract: Motivation: Although gene set enrichment analysis has become an integral part of high-throughput gene expression data analysis, the assessment of enrichment methods remains rudimentary and ad hoc . In the absence of suitable gold standards, evaluations are commonly restricted to selected datasets and biological reasoning on the relevance of resulting enriched gene sets. Results: We develop an extensible framework for reproducible benchmarking of enrichment methods based on defined criteria for applicability, gene set prioritization and detection of relevant processes. This framework incorporates a curated compendium of 75 expression datasets investigating 42 human diseases. The compendium features microarray and RNA-seq measurements, and each dataset is associated with a precompiled GO/KEGG relevance ranking for the corresponding disease under investigation. We perform a comprehensive assessment of 10 major enrichment methods, identifying significant differences in runtime and applicability to RNA-seq data, fraction of enriched gene sets depending on the null hypothesis tested and recovery of the predefined relevance rankings. We make practical recommendations on how methods originally developed for microarray data can efficiently be applied to RNA-seq data, how to interpret results depending on the type of gene set test conducted and which methods are best suited to effectively prioritize gene sets with high phenotype relevance. Availability: http://bioconductor.org/packages/GSEABenchmarkeR Contact: ludwig.geistlinger@sph.cuny.edu … (more)
- Is Part Of:
- Briefings in bioinformatics. Volume 22:Issue 1(2021)
- Journal:
- Briefings in bioinformatics
- Issue:
- Volume 22:Issue 1(2021)
- Issue Display:
- Volume 22, Issue 1 (2021)
- Year:
- 2021
- Volume:
- 22
- Issue:
- 1
- Issue Sort Value:
- 2021-0022-0001-0000
- Page Start:
- 545
- Page End:
- 556
- Publication Date:
- 2020-03-09
- Subjects:
- gene set analysis -- pathway analysis -- gene expression data -- microarray -- RNA-seq
Genetics -- Data processing -- Periodicals
Molecular biology -- Data processing -- Periodicals
Genomes -- Data processing -- Periodicals
572.80285 - Journal URLs:
- http://bib.oxfordjournals.org ↗
http://www.oxfordjournals.org/content?genre=journal&issn=1477-4054 ↗
http://ukcatalogue.oup.com/ ↗
http://firstsearch.oclc.org ↗ - DOI:
- 10.1093/bib/bbz158 ↗
- Languages:
- English
- ISSNs:
- 1467-5463
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 2283.958363
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 15745.xml