Efficient Semiparametric Inference Under Two-Phase Sampling, With Applications to Genetic Association Studies. Issue 520 (2nd October 2017)
- Record Type:
- Journal Article
- Title:
- Efficient Semiparametric Inference Under Two-Phase Sampling, With Applications to Genetic Association Studies. Issue 520 (2nd October 2017)
- Main Title:
- Efficient Semiparametric Inference Under Two-Phase Sampling, With Applications to Genetic Association Studies
- Authors:
- Tao, Ran
Zeng, Donglin
Lin, Dan-Yu - Abstract:
- ABSTRACT: In modern epidemiological and clinical studies, the covariates of interest may involve genome sequencing, biomarker assay, or medical imaging and thus are prohibitively expensive to measure on a large number of subjects. A cost-effective solution is the two-phase design, under which the outcome and inexpensive covariates are observed for all subjects during the first phase and that information is used to select subjects for measurements of expensive covariates during the second phase. For example, subjects with extreme values of quantitative traits were selected for whole-exome sequencing in the National Heart, Lung, and Blood Institute (NHLBI) Exome Sequencing Project (ESP). Herein, we consider general two-phase designs, where the outcome can be continuous or discrete, and inexpensive covariates can be continuous and correlated with expensive covariates. We propose a semiparametric approach to regression analysis by approximating the conditional density functions of expensive covariates given inexpensive covariates with B-spline sieves. We devise a computationally efficient and numerically stable EM-algorithm to maximize the sieve likelihood. In addition, we establish the consistency, asymptotic normality, and asymptotic efficiency of the estimators. Furthermore, we demonstrate the superiority of the proposed methods over existing ones through extensive simulation studies. Finally, we present applications to the aforementioned NHLBI ESP. Supplementary materialsABSTRACT: In modern epidemiological and clinical studies, the covariates of interest may involve genome sequencing, biomarker assay, or medical imaging and thus are prohibitively expensive to measure on a large number of subjects. A cost-effective solution is the two-phase design, under which the outcome and inexpensive covariates are observed for all subjects during the first phase and that information is used to select subjects for measurements of expensive covariates during the second phase. For example, subjects with extreme values of quantitative traits were selected for whole-exome sequencing in the National Heart, Lung, and Blood Institute (NHLBI) Exome Sequencing Project (ESP). Herein, we consider general two-phase designs, where the outcome can be continuous or discrete, and inexpensive covariates can be continuous and correlated with expensive covariates. We propose a semiparametric approach to regression analysis by approximating the conditional density functions of expensive covariates given inexpensive covariates with B-spline sieves. We devise a computationally efficient and numerically stable EM-algorithm to maximize the sieve likelihood. In addition, we establish the consistency, asymptotic normality, and asymptotic efficiency of the estimators. Furthermore, we demonstrate the superiority of the proposed methods over existing ones through extensive simulation studies. Finally, we present applications to the aforementioned NHLBI ESP. Supplementary materials for this article are available online … (more)
- Is Part Of:
- Journal of the American Statistical Association. Volume 112:Issue 520(2017)
- Journal:
- Journal of the American Statistical Association
- Issue:
- Volume 112:Issue 520(2017)
- Issue Display:
- Volume 112, Issue 520 (2017)
- Year:
- 2017
- Volume:
- 112
- Issue:
- 520
- Issue Sort Value:
- 2017-0112-0520-0000
- Page Start:
- 1468
- Page End:
- 1476
- Publication Date:
- 2017-10-02
- Subjects:
- Biased sampling -- EM algorithm -- Genome sequencing -- Response-selective sampling -- Semiparametric efficiency -- Sieve approximation
Statistics -- Periodicals
Statistics -- Periodicals
Statistiques -- Périodiques
États-Unis -- Statistiques -- Périodiques
519.5 - Journal URLs:
- http://www.jstor.org/journals/01621459.html ↗
http://www.ingentaconnect.com/content/asa/jasa ↗
http://www.tandfonline.com/loi/uasa20 ↗
http://www.tandfonline.com/ ↗ - DOI:
- 10.1080/01621459.2017.1295864 ↗
- Languages:
- English
- ISSNs:
- 0162-1459
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4694.000000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 17300.xml