An analytic approach for interpretable predictive models in high‐dimensional data in the presence of interactions with exposures. Issue 3 (8th February 2018)
- Record Type:
- Journal Article
- Title:
- An analytic approach for interpretable predictive models in high‐dimensional data in the presence of interactions with exposures. Issue 3 (8th February 2018)
- Main Title:
- An analytic approach for interpretable predictive models in high‐dimensional data in the presence of interactions with exposures
- Authors:
- Bhatnagar, Sahir Rai
Yang, Yi
Khundrakpam, Budhachandra
Evans, Alan C.
Blanchette, Mathieu
Bouchard, Luigi
Greenwood, Celia M.T. - Abstract:
- ABSTRACT: Predicting a phenotype and understanding which variables improve that prediction are two very challenging and overlapping problems in the analysis of high‐dimensional (HD) data such as those arising from genomic and brain imaging studies. It is often believed that the number of truly important predictors is small relative to the total number of variables, making computational approaches to variable selection and dimension reduction extremely important. To reduce dimensionality, commonly used two‐step methods first cluster the data in some way, and build models using cluster summaries to predict the phenotype. It is known that important exposure variables can alter correlation patterns between clusters of HD variables, that is, alter network properties of the variables. However, it is not well understood whether such altered clustering is informative in prediction. Here, assuming there is a binary exposure with such network‐altering effects, we explore whether the use of exposure‐dependent clustering relationships in dimension reduction can improve predictive modeling in a two‐step framework. Hence, we propose a modeling framework called ECLUST to test this hypothesis, and evaluate its performance through extensive simulations. With ECLUST, we found improved prediction and variable selection performance compared to methods that do not consider the environment in the clustering step, or to methods that use the original data as features. We further illustrate thisABSTRACT: Predicting a phenotype and understanding which variables improve that prediction are two very challenging and overlapping problems in the analysis of high‐dimensional (HD) data such as those arising from genomic and brain imaging studies. It is often believed that the number of truly important predictors is small relative to the total number of variables, making computational approaches to variable selection and dimension reduction extremely important. To reduce dimensionality, commonly used two‐step methods first cluster the data in some way, and build models using cluster summaries to predict the phenotype. It is known that important exposure variables can alter correlation patterns between clusters of HD variables, that is, alter network properties of the variables. However, it is not well understood whether such altered clustering is informative in prediction. Here, assuming there is a binary exposure with such network‐altering effects, we explore whether the use of exposure‐dependent clustering relationships in dimension reduction can improve predictive modeling in a two‐step framework. Hence, we propose a modeling framework called ECLUST to test this hypothesis, and evaluate its performance through extensive simulations. With ECLUST, we found improved prediction and variable selection performance compared to methods that do not consider the environment in the clustering step, or to methods that use the original data as features. We further illustrate this modeling framework through the analysis of three data sets from very different fields, each with HD data, a binary exposure, and a phenotype of interest. Our method is available in the eclust CRAN package. … (more)
- Is Part Of:
- Genetic epidemiology. Volume 42:Issue 3(2018)
- Journal:
- Genetic epidemiology
- Issue:
- Volume 42:Issue 3(2018)
- Issue Display:
- Volume 42, Issue 3 (2018)
- Year:
- 2018
- Volume:
- 42
- Issue:
- 3
- Issue Sort Value:
- 2018-0042-0003-0000
- Page Start:
- 233
- Page End:
- 249
- Publication Date:
- 2018-02-08
- Subjects:
- gene‐environment interaction -- high‐dimensional clustering -- prediction models -- topological overlap matrix -- penalized regression
Genetic epidemiology -- Periodicals
Heredity -- Periodicals
Medical geography -- Periodicals
614 - Journal URLs:
- http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1098-2272 ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1002/gepi.22112 ↗
- Languages:
- English
- ISSNs:
- 0741-0395
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4111.848000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 6039.xml