Value of Mendelian Laws of Segregation in Families: Data Quality Control, Imputation, and Beyond. Issue 1 (September 2014)
- Record Type:
- Journal Article
- Title:
- Value of Mendelian Laws of Segregation in Families: Data Quality Control, Imputation, and Beyond. Issue 1 (September 2014)
- Main Title:
- Value of Mendelian Laws of Segregation in Families: Data Quality Control, Imputation, and Beyond
- Authors:
- Blue, Elizabeth M.
Sun, Lei
Tintle, Nathan L.
Wijsman, Ellen M.
Paterson, Andrew
Bickeböller, Heike
Almasy, Laura - Abstract:
- <abstract abstract-type="main"> <title>ABSTRACT</title> <p>When analyzing family data, we dream of perfectly informative data, even whole‐genome sequences (WGSs) for all family members. Reality intervenes, and we find that next‐generation sequencing (NGS) data have errors and are often too expensive or impossible to collect on everyone. The Genetic Analysis Workshop 18 working groups on quality control and dropping WGSs through families using a genome‐wide association framework focused on finding, correcting, and using errors within the available sequence and family data, developing methods to infer and analyze missing sequence data among relatives, and testing for linkage and association with simulated blood pressure. We found that single‐nucleotide polymorphisms, NGS data, and imputed data are generally concordant but that errors are particularly likely at rare variants, for homozygous genotypes, within regions with repeated sequences or structural variants, and within sequence data imputed from unrelated individuals. Admixture complicated identification of cryptic relatedness, but information from Mendelian transmission improved error detection and provided an estimate of the de novo mutation rate. Computationally, fast rule‐based imputation was accurate but could not cover as many loci or subjects as more computationally demanding probability‐based methods. Incorporating population‐level data into pedigree‐based imputation methods improved results. Observed data<abstract abstract-type="main"> <title>ABSTRACT</title> <p>When analyzing family data, we dream of perfectly informative data, even whole‐genome sequences (WGSs) for all family members. Reality intervenes, and we find that next‐generation sequencing (NGS) data have errors and are often too expensive or impossible to collect on everyone. The Genetic Analysis Workshop 18 working groups on quality control and dropping WGSs through families using a genome‐wide association framework focused on finding, correcting, and using errors within the available sequence and family data, developing methods to infer and analyze missing sequence data among relatives, and testing for linkage and association with simulated blood pressure. We found that single‐nucleotide polymorphisms, NGS data, and imputed data are generally concordant but that errors are particularly likely at rare variants, for homozygous genotypes, within regions with repeated sequences or structural variants, and within sequence data imputed from unrelated individuals. Admixture complicated identification of cryptic relatedness, but information from Mendelian transmission improved error detection and provided an estimate of the de novo mutation rate. Computationally, fast rule‐based imputation was accurate but could not cover as many loci or subjects as more computationally demanding probability‐based methods. Incorporating population‐level data into pedigree‐based imputation methods improved results. Observed data outperformed imputed data in association testing, but imputed data were also useful. We discuss the strengths and weaknesses of existing methods and suggest possible future directions, such as improving communication between data collectors and data analysts, establishing thresholds for and improving imputation quality, and incorporating error into imputation and analytical models.</p> </abstract> … (more)
- Is Part Of:
- Genetic epidemiology. Volume 38:Issue 1(2014)
- Journal:
- Genetic epidemiology
- Issue:
- Volume 38:Issue 1(2014)
- Issue Display:
- Volume 38, Issue 1 (2014)
- Year:
- 2014
- Volume:
- 38
- Issue:
- 1
- Issue Sort Value:
- 2014-0038-0001-0000
- Page Start:
- S21
- Page End:
- S28
- Publication Date:
- 2014-09
- Subjects:
- Genetic epidemiology -- Periodicals
Heredity -- Periodicals
Medical geography -- Periodicals
614 - Journal URLs:
- http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1098-2272 ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1002/gepi.21821 ↗
- Languages:
- English
- ISSNs:
- 0741-0395
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4111.848000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 4131.xml