Leveraging global gene expression patterns to predict expression of unmeasured genes. (December 2015)
- Record Type:
- Journal Article
- Title:
- Leveraging global gene expression patterns to predict expression of unmeasured genes. (December 2015)
- Main Title:
- Leveraging global gene expression patterns to predict expression of unmeasured genes
- Authors:
- Rudd, James
Zelaya, René
Demidenko, Eugene
Goode, Ellen
Greene, Casey
Doherty, Jennifer - Abstract:
- Abstract Background Large collections of paraffin-embedded tissue represent a rich resource to test hypotheses based on gene expression patterns; however, measurement of genome-wide expression is cost-prohibitive on a large scale. Using the known expression correlation structure within a given disease type (in this case, high grade serous ovarian cancer; HGSC), we sought to identify reduced sets of directly measured (DM) genes which could accurately predict the expression of a maximized number of unmeasured genes. Results We developed a greedy gene set selection (GGS) algorithm which returns a DM set of user specified size based on a specific correlation threshold (|rP |) and minimum number of DM genes that must be correlated to an unmeasured gene in order to infer the value of the unmeasured gene (redundancy). We evaluated GGS in the Cancer Genome Atlas (TCGA) HGSC data across 144 combinations of DM size, redundancy (1–3), and |rP | (0.60, 0.65, 0.70). Across the parameter sweep, GGS allows on average 9 times more gene expression information to be captured compared to the DM set alone. GGS successfully augments prognostic HGSC gene sets; the addition of 20 GGS selected genes more than doubles the number of genes whose expression is predictable. Moreover, the expression prediction is highly accurate. After training regression models for the predictable gene set using 2/3 of the TCGA data, the average accuracy (ranked correlation of true and predicted values) in the 1/3Abstract Background Large collections of paraffin-embedded tissue represent a rich resource to test hypotheses based on gene expression patterns; however, measurement of genome-wide expression is cost-prohibitive on a large scale. Using the known expression correlation structure within a given disease type (in this case, high grade serous ovarian cancer; HGSC), we sought to identify reduced sets of directly measured (DM) genes which could accurately predict the expression of a maximized number of unmeasured genes. Results We developed a greedy gene set selection (GGS) algorithm which returns a DM set of user specified size based on a specific correlation threshold (|rP |) and minimum number of DM genes that must be correlated to an unmeasured gene in order to infer the value of the unmeasured gene (redundancy). We evaluated GGS in the Cancer Genome Atlas (TCGA) HGSC data across 144 combinations of DM size, redundancy (1–3), and |rP | (0.60, 0.65, 0.70). Across the parameter sweep, GGS allows on average 9 times more gene expression information to be captured compared to the DM set alone. GGS successfully augments prognostic HGSC gene sets; the addition of 20 GGS selected genes more than doubles the number of genes whose expression is predictable. Moreover, the expression prediction is highly accurate. After training regression models for the predictable gene set using 2/3 of the TCGA data, the average accuracy (ranked correlation of true and predicted values) in the 1/3 testing partition and four independent populations is above 0.65 and approaches 0.8 for conservative parameter sets. We observe similar accuracies in the TCGA HGSC RNA-sequencing data. Specifically, the prediction accuracy increases with increasing redundancy and increasing |rP |. Conclusions GGS-selected genes, which maximize expression information about unmeasured genes, can be combined with candidate gene sets as a cost effective way to increase the amount of gene expression information obtained in large studies. This method can be applied to any organism, model system, disease, or tissue type for which whole genome gene expression data exists. … (more)
- Is Part Of:
- BMC genomics. Volume 16:Number 1(2015)
- Journal:
- BMC genomics
- Issue:
- Volume 16:Number 1(2015)
- Issue Display:
- Volume 16, Issue 1 (2015)
- Year:
- 2015
- Volume:
- 16
- Issue:
- 1
- Issue Sort Value:
- 2015-0016-0001-0000
- Page Start:
- 1
- Page End:
- 11
- Publication Date:
- 2015-12
- Subjects:
- Gene expression -- Greedy gene set selection -- GGS -- Imputation
Genomes -- Periodicals
Gene mapping -- Periodicals
Genomics -- Periodicals
Base Sequence -- Periodicals
Chromosome Mapping -- Periodicals
Genetic Techniques -- Periodicals
Sequence Analysis, DNA -- Periodicals
572.8605 - Journal URLs:
- http://www.biomedcentral.com/bmcgenomics/ ↗
http://www.pubmedcentral.nih.gov/tocrender.fcgi?journal=32 ↗
http://link.springer.com/ ↗ - DOI:
- 10.1186/s12864-015-2250-5 ↗
- Languages:
- English
- ISSNs:
- 1471-2164
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 9851.xml