Mining Massive Amounts of Genomic Data: A Semiparametric Topic Modeling Approach. Issue 519 (3rd July 2017)
- Record Type:
- Journal Article
- Title:
- Mining Massive Amounts of Genomic Data: A Semiparametric Topic Modeling Approach. Issue 519 (3rd July 2017)
- Main Title:
- Mining Massive Amounts of Genomic Data: A Semiparametric Topic Modeling Approach
- Authors:
- Fang, Ethan X.
Li, Min-Dian
Jordan, Michael I.
Liu, Han - Abstract:
- ABSTRACT: Characterizing the functional relevance of transcription factors (TFs) in different biological contexts is pivotal in systems biology. Given the massive amount of genomic data, computational identification of TFs is emerging as a useful approach to bridge functional genomics with disease risk loci. In this article, we use large-scale gene expression and chromatin immunoprecipitation (ChIP) data corpuses to conduct high-throughput TF-biological context association analysis. This work makes two contributions: (i) From a methodological perspective, we propose a unified topic modeling framework for exploring and analyzing large and complex genomic datasets. Under this framework, we develop new statistical optimization algorithms and semiparametric theoretical analysis, which are also applicable to a variety of large-scale data analyses. (ii) From an experimental perspective, our method generates an informative list of tumor-related TFs and their possible effected tumor types. Our data-driven analysis of 38 TFs in 68 tumor biological contexts identifies functional signatures of epigenetic regulators, such as SUZ12 and SET-DB1, and nuclear receptors, in many tumor types. In particular, the TF signature of SUZ12 is present in a broad range of tumor types, many of which have not been reported before. In summary, our work established a robust method to identify the association between TFs and biological contexts. Given the limited amount of genome-wide binding profiles ofABSTRACT: Characterizing the functional relevance of transcription factors (TFs) in different biological contexts is pivotal in systems biology. Given the massive amount of genomic data, computational identification of TFs is emerging as a useful approach to bridge functional genomics with disease risk loci. In this article, we use large-scale gene expression and chromatin immunoprecipitation (ChIP) data corpuses to conduct high-throughput TF-biological context association analysis. This work makes two contributions: (i) From a methodological perspective, we propose a unified topic modeling framework for exploring and analyzing large and complex genomic datasets. Under this framework, we develop new statistical optimization algorithms and semiparametric theoretical analysis, which are also applicable to a variety of large-scale data analyses. (ii) From an experimental perspective, our method generates an informative list of tumor-related TFs and their possible effected tumor types. Our data-driven analysis of 38 TFs in 68 tumor biological contexts identifies functional signatures of epigenetic regulators, such as SUZ12 and SET-DB1, and nuclear receptors, in many tumor types. In particular, the TF signature of SUZ12 is present in a broad range of tumor types, many of which have not been reported before. In summary, our work established a robust method to identify the association between TFs and biological contexts. Given the limited amount of genome-wide binding profiles of TFs and the massive number of expression profiles, our work provides a useful tool to deconvolute the gene regulatory network for tumors and other biological contexts. Supplementary materials for this article are available online. … (more)
- Is Part Of:
- Journal of the American Statistical Association. Volume 112:Issue 519(2017)
- Journal:
- Journal of the American Statistical Association
- Issue:
- Volume 112:Issue 519(2017)
- Issue Display:
- Volume 112, Issue 519 (2017)
- Year:
- 2017
- Volume:
- 112
- Issue:
- 519
- Issue Sort Value:
- 2017-0112-0519-0000
- Page Start:
- 921
- Page End:
- 932
- Publication Date:
- 2017-07-03
- Subjects:
- Association study -- Genomic data -- Semiparametric modeling -- Topic modeling
Statistics -- Periodicals
Statistics -- Periodicals
Statistiques -- Périodiques
États-Unis -- Statistiques -- Périodiques
519.5 - Journal URLs:
- http://www.jstor.org/journals/01621459.html ↗
http://www.ingentaconnect.com/content/asa/jasa ↗
http://www.tandfonline.com/loi/uasa20 ↗
http://www.tandfonline.com/ ↗ - DOI:
- 10.1080/01621459.2016.1256812 ↗
- Languages:
- English
- ISSNs:
- 0162-1459
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4694.000000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 8333.xml