Evaluating Gene Set Enrichment Analysis via a Hybrid Data Model. (January 2014)
- Record Type:
- Journal Article
- Title:
- Evaluating Gene Set Enrichment Analysis via a Hybrid Data Model. (January 2014)
- Main Title:
- Evaluating Gene Set Enrichment Analysis via a Hybrid Data Model
- Authors:
- Hua, Jianping
Bittner, Michael L.
Dougherty, Edward R. - Abstract:
- Gene set enrichment analysis (GSA) methods have been widely adopted by biological labs to analyze data and generate hypotheses for validation. Most of the existing comparison studies focus on whether the existing GSA methods can produce accurate P -values; however, practitioners are often more concerned with the correct gene-set ranking generated by the methods. The ranking performance is closely related to two critical goals associated with GSA methods: the ability to reveal biological themes and ensuring reproducibility, especially for small-sample studies. We have conducted a comprehensive simulation study focusing on the ranking performance of seven representative GSA methods. We overcome the limitation on the availability of real data sets by creating hybrid data models from existing large data sets. To build the data model, we pick a master gene from the data set to form the ground truth and artificially generate the phenotype labels. Multiple hybrid data models can be constructed from one data set and multiple data sets of smaller sizes can be generated by resampling the original data set. This approach enables us to generate a large batch of data sets to check the ranking performance of GSA methods. Our simulation study reveals that for the proposed data model, the Q2 type GSA methods have in general better performance than other GSA methods and the global test has the most robust results. The properties of a data set play a critical role in the performance. For theGene set enrichment analysis (GSA) methods have been widely adopted by biological labs to analyze data and generate hypotheses for validation. Most of the existing comparison studies focus on whether the existing GSA methods can produce accurate P -values; however, practitioners are often more concerned with the correct gene-set ranking generated by the methods. The ranking performance is closely related to two critical goals associated with GSA methods: the ability to reveal biological themes and ensuring reproducibility, especially for small-sample studies. We have conducted a comprehensive simulation study focusing on the ranking performance of seven representative GSA methods. We overcome the limitation on the availability of real data sets by creating hybrid data models from existing large data sets. To build the data model, we pick a master gene from the data set to form the ground truth and artificially generate the phenotype labels. Multiple hybrid data models can be constructed from one data set and multiple data sets of smaller sizes can be generated by resampling the original data set. This approach enables us to generate a large batch of data sets to check the ranking performance of GSA methods. Our simulation study reveals that for the proposed data model, the Q2 type GSA methods have in general better performance than other GSA methods and the global test has the most robust results. The properties of a data set play a critical role in the performance. For the data sets with highly connected genes, all GSA methods suffer significantly in performance. … (more)
- Is Part Of:
- Cancer informatics. Volume 13(2014)Supplement 1
- Journal:
- Cancer informatics
- Issue:
- Volume 13(2014)Supplement 1
- Issue Display:
- Volume 13, Issue 1 (2014)
- Year:
- 2014
- Volume:
- 13
- Issue:
- 1
- Issue Sort Value:
- 2014-0013-0001-0000
- Page Start:
- Page End:
- Publication Date:
- 2014-01
- Subjects:
- gene set enrichment analysis -- feature ranking -- data model -- simulation study
Bioinformatics -- Periodicals
Biology -- Data processing -- Periodicals
Cancer -- Periodicals
Cancer -- Research -- Periodicals
Computational biology -- Periodicals
570.285 - Journal URLs:
- http://insights.sagepub.com/journal.php?journal_id=10&tab=volume ↗
http://www.uk.sagepub.com/home.nav ↗ - DOI:
- 10.4137/CIN.S13305 ↗
- Languages:
- English
- ISSNs:
- 1176-9351
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 23608.xml