On the association analysis of genome‐sequencing data: A spatial clustering approach for partitioning the entire genome into nonoverlapping windows. Issue 4 (20th March 2017)
- Record Type:
- Journal Article
- Title:
- On the association analysis of genome‐sequencing data: A spatial clustering approach for partitioning the entire genome into nonoverlapping windows. Issue 4 (20th March 2017)
- Main Title:
- On the association analysis of genome‐sequencing data: A spatial clustering approach for partitioning the entire genome into nonoverlapping windows
- Authors:
- Loehlein Fier, Heide
Prokopenko, Dmitry
Hecker, Julian
Cho, Michael H.
Silverman, Edwin K.
Weiss, Scott T.
Tanzi, Rudolph E.
Lange, Christoph - Abstract:
- ABSTRACT: For the association analysis of whole‐genome sequencing (WGS) studies, we propose an efficient and fast spatial‐clustering algorithm. Compared to existing analysis approaches for WGS data, that define the tested regions either by sliding or consecutive windows of fixed sizes along variants, a meaningful grouping of nearby variants into consecutive regions has the advantage that, compared to sliding window approaches, the number of tested regions is likely to be smaller. In comparison to consecutive, fixed‐window approaches, our approach is likely to group nearby variants together. Given existing biological evidence that disease‐associated mutations tend to physically cluster in specific regions along the chromosome, the identification of meaningful groups of nearby located variants could thus lead to a potential power gain for association analysis. Our algorithm defines consecutive genomic regions based on the physical positions of the variants, assuming an inhomogeneous Poisson process and groups together nearby variants. As parameters are estimated locally, the algorithm takes the differing variant density along the chromosome into account and provides locally optimal partitioning of variants into consecutive regions. An R‐implementation of the algorithm is provided. We discuss the theoretical advances of our algorithm compared to existing, window‐based approaches and show the performance and advantage of our introduced algorithm in a simulation study and by anABSTRACT: For the association analysis of whole‐genome sequencing (WGS) studies, we propose an efficient and fast spatial‐clustering algorithm. Compared to existing analysis approaches for WGS data, that define the tested regions either by sliding or consecutive windows of fixed sizes along variants, a meaningful grouping of nearby variants into consecutive regions has the advantage that, compared to sliding window approaches, the number of tested regions is likely to be smaller. In comparison to consecutive, fixed‐window approaches, our approach is likely to group nearby variants together. Given existing biological evidence that disease‐associated mutations tend to physically cluster in specific regions along the chromosome, the identification of meaningful groups of nearby located variants could thus lead to a potential power gain for association analysis. Our algorithm defines consecutive genomic regions based on the physical positions of the variants, assuming an inhomogeneous Poisson process and groups together nearby variants. As parameters are estimated locally, the algorithm takes the differing variant density along the chromosome into account and provides locally optimal partitioning of variants into consecutive regions. An R‐implementation of the algorithm is provided. We discuss the theoretical advances of our algorithm compared to existing, window‐based approaches and show the performance and advantage of our introduced algorithm in a simulation study and by an application to Alzheimer's disease WGS data. Our analysis identifies a region in the ITGB3 gene that potentially harbors disease susceptibility loci for Alzheimer's disease. The region‐based association signal of ITGB3 replicates in an independent data set and achieves formally genome‐wide significance. Software Implementation : An implementation of the algorithm in R is available at:https://github.com/heidefier/cluster_wgs_data . … (more)
- Is Part Of:
- Genetic epidemiology. Volume 41:Issue 4(2017)
- Journal:
- Genetic epidemiology
- Issue:
- Volume 41:Issue 4(2017)
- Issue Display:
- Volume 41, Issue 4 (2017)
- Year:
- 2017
- Volume:
- 41
- Issue:
- 4
- Issue Sort Value:
- 2017-0041-0004-0000
- Page Start:
- 332
- Page End:
- 340
- Publication Date:
- 2017-03-20
- Subjects:
- WGS data -- clustering -- genetic association analysis
Genetic epidemiology -- Periodicals
Heredity -- Periodicals
Medical geography -- Periodicals
614 - Journal URLs:
- http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1098-2272 ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1002/gepi.22040 ↗
- Languages:
- English
- ISSNs:
- 0741-0395
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4111.848000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 1737.xml