Beyond standard pipeline and p < 0.05 in pathway enrichment analyses. (June 2021)
- Record Type:
- Journal Article
- Title:
- Beyond standard pipeline and p < 0.05 in pathway enrichment analyses. (June 2021)
- Main Title:
- Beyond standard pipeline and p < 0.05 in pathway enrichment analyses
- Authors:
- Li, Wentian
Shih, Andrew
Freudenberg-Hua, Yun
Fury, Wen
Yang, Yaning - Abstract:
- Highlights: Statisticians have been calling attention to prudent use of p -values for years. However, p -value centric approach is common in computational and statistical biosciences. We illustrate the problem with the practice of using p -value only in pathway enrichment analysis. We point out one particular issue that many "good" p -values in over-representation analysis can be traced to the fact of a large number of human genes ( N = 20, 000). Six cautions are provided beyond the standard pipeline in pathway enrichment analysis. We propose possible recommendations with the goal of helping the translation of computational and statistical analysis results to biological conclusions. Abstract: A standard pathway/gene-set enrichment analysis, the over-representation analysis, is based on four values: the size of two gene-sets, size of their overlap, and size of the gene universe from which the gene-sets are chosen. The standard result of such an analysis is based on the p -value of a statistical test. We supplement this standard pipeline by six cautions: (1) any p -value threshold to distinguish enriched gene-sets from not-enriched ones is to certain degree arbitrary; (2) genes in a gene-set may be correlated, which potentially overcount the gene-set size; (3) any attempt to impose multiple testing correction will increase the false negative rate; (4) gene-sets in a gene-set database may be correlated, potentially overcount the factor for multiple testing correction; (5) theHighlights: Statisticians have been calling attention to prudent use of p -values for years. However, p -value centric approach is common in computational and statistical biosciences. We illustrate the problem with the practice of using p -value only in pathway enrichment analysis. We point out one particular issue that many "good" p -values in over-representation analysis can be traced to the fact of a large number of human genes ( N = 20, 000). Six cautions are provided beyond the standard pipeline in pathway enrichment analysis. We propose possible recommendations with the goal of helping the translation of computational and statistical analysis results to biological conclusions. Abstract: A standard pathway/gene-set enrichment analysis, the over-representation analysis, is based on four values: the size of two gene-sets, size of their overlap, and size of the gene universe from which the gene-sets are chosen. The standard result of such an analysis is based on the p -value of a statistical test. We supplement this standard pipeline by six cautions: (1) any p -value threshold to distinguish enriched gene-sets from not-enriched ones is to certain degree arbitrary; (2) genes in a gene-set may be correlated, which potentially overcount the gene-set size; (3) any attempt to impose multiple testing correction will increase the false negative rate; (4) gene-sets in a gene-set database may be correlated, potentially overcount the factor for multiple testing correction; (5) the discrete nature of the data make it possible that a minimum change in counts may lead to a quantum change in the p -value threshold-based conclusion; (6) the two gene-sets may not be chosen from the universe of all human genes, but in fact from a subset of that universe, or even two different subsets of all genes. Careful reconsideration of these issues can have an impact on an enrichment analysis conclusion. Part of our cautions mirror the call from statistician that reaching conclusion from data is not a simple matter of p -value smaller than 0.05, but a thoughtful process with due diligences. … (more)
- Is Part Of:
- Computational biology and chemistry. Volume 92(2021)
- Journal:
- Computational biology and chemistry
- Issue:
- Volume 92(2021)
- Issue Display:
- Volume 92, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 92
- Issue:
- 2021
- Issue Sort Value:
- 2021-0092-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-06
- Subjects:
- Statistical significance -- Gene-set enrichment -- Pathway analysis -- Human genes -- Pipelines
Chemistry -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
Biochemistry -- Data processing
Biology -- Data processing
Molecular biology -- Data processing
Periodicals
Electronic journals
542.85 - Journal URLs:
- http://www.sciencedirect.com/science/journal/14769271 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiolchem.2021.107455 ↗
- Languages:
- English
- ISSNs:
- 1476-9271
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3390.576700
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 16977.xml