Iterative hard thresholding for model selection in genome‐wide association studies. Issue 8 (6th September 2017)
- Record Type:
- Journal Article
- Title:
- Iterative hard thresholding for model selection in genome‐wide association studies. Issue 8 (6th September 2017)
- Main Title:
- Iterative hard thresholding for model selection in genome‐wide association studies
- Authors:
- Keys, Kevin L.
Chen, Gary K.
Lange, Kenneth - Abstract:
- ABSTRACT: A genome‐wide association study (GWAS) correlates marker and trait variation in a study sample. Each subject is genotyped at a multitude of SNPs (single nucleotide polymorphisms) spanning the genome. Here, we assume that subjects are randomly collected unrelateds and that trait values are normally distributed or can be transformed to normality. Over the past decade, geneticists have been remarkably successful in applying GWAS analysis to hundreds of traits. The massive amount of data produced in these studies present unique computational challenges. Penalized regression with the ℓ1 penalty (LASSO) or minimax concave penalty (MCP) penalties is capable of selecting a handful of associated SNPs from millions of potential SNPs. Unfortunately, model selection can be corrupted by false positives and false negatives, obscuring the genetic underpinning of a trait. Here, we compare LASSO and MCP penalized regression to iterative hard thresholding (IHT). On GWAS regression data, IHT is better at model selection and comparable in speed to both methods of penalized regression. This conclusion holds for both simulated and real GWAS data. IHT fosters parallelization and scales well in problems with large numbers of causal markers. Our parallel implementation of IHT accommodates SNP genotype compression and exploits multiple CPU cores and graphics processing units (GPUs). This allows statistical geneticists to leverage commodity desktop computers in GWAS analysis and to avoidABSTRACT: A genome‐wide association study (GWAS) correlates marker and trait variation in a study sample. Each subject is genotyped at a multitude of SNPs (single nucleotide polymorphisms) spanning the genome. Here, we assume that subjects are randomly collected unrelateds and that trait values are normally distributed or can be transformed to normality. Over the past decade, geneticists have been remarkably successful in applying GWAS analysis to hundreds of traits. The massive amount of data produced in these studies present unique computational challenges. Penalized regression with the ℓ1 penalty (LASSO) or minimax concave penalty (MCP) penalties is capable of selecting a handful of associated SNPs from millions of potential SNPs. Unfortunately, model selection can be corrupted by false positives and false negatives, obscuring the genetic underpinning of a trait. Here, we compare LASSO and MCP penalized regression to iterative hard thresholding (IHT). On GWAS regression data, IHT is better at model selection and comparable in speed to both methods of penalized regression. This conclusion holds for both simulated and real GWAS data. IHT fosters parallelization and scales well in problems with large numbers of causal markers. Our parallel implementation of IHT accommodates SNP genotype compression and exploits multiple CPU cores and graphics processing units (GPUs). This allows statistical geneticists to leverage commodity desktop computers in GWAS analysis and to avoid supercomputing.Availability : Source code is freely available athttps://github.com/klkeys/IHT.jl . … (more)
- Is Part Of:
- Genetic epidemiology. Volume 41:Issue 8(2017)
- Journal:
- Genetic epidemiology
- Issue:
- Volume 41:Issue 8(2017)
- Issue Display:
- Volume 41, Issue 8 (2017)
- Year:
- 2017
- Volume:
- 41
- Issue:
- 8
- Issue Sort Value:
- 2017-0041-0008-0000
- Page Start:
- 756
- Page End:
- 768
- Publication Date:
- 2017-09-06
- Subjects:
- genetic association studies -- greedy algorithm -- parallel computing -- sparse regression
Genetic epidemiology -- Periodicals
Heredity -- Periodicals
Medical geography -- Periodicals
614 - Journal URLs:
- http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1098-2272 ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1002/gepi.22068 ↗
- Languages:
- English
- ISSNs:
- 0741-0395
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4111.848000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 5362.xml