Using off-target data from whole-exome sequencing to improve genotyping accuracy, association analysis and polygenic risk prediction. Issue 3 (17th June 2020)
- Record Type:
- Journal Article
- Title:
- Using off-target data from whole-exome sequencing to improve genotyping accuracy, association analysis and polygenic risk prediction. Issue 3 (17th June 2020)
- Main Title:
- Using off-target data from whole-exome sequencing to improve genotyping accuracy, association analysis and polygenic risk prediction
- Authors:
- Dou, Jinzhuang
Wu, Degang
Ding, Lin
Wang, Kai
Jiang, Minghui
Chai, Xiaoran
Reilly, Dermot F
Tai, E Shyong
Liu, Jianjun
Sim, Xueling
Cheng, Shanshan
Wang, Chaolong - Abstract:
- Abstract: Whole-exome sequencing (WES) has been widely used to study the role of protein-coding variants in genetic diseases. Non-coding regions, typically covered by sparse off-target data, are often discarded by conventional WES analyses. Here, we develop a genotype calling pipeline named WEScall to analyse both target and off-target data. We leverage linkage disequilibrium shared within study samples and from an external reference panel to improve genotyping accuracy. In an application to WES of 2527 Chinese and Malays, WEScall can reduce the genotype discordance rate from 0.26% (SE= 6.4 × 10 −6 ) to 0.08% (SE = 3.6 × 10 −6 ) across 1.1 million single nucleotide polymorphisms (SNPs) in the deeply sequenced target regions. Furthermore, we obtain genotypes at 0.70% (SE = 3.0 × 10 −6 ) discordance rate across 5.2 million off-target SNPs, which had ~1.2× mean sequencing depth. Using this dataset, we perform genome-wide association studies of 10 metabolic traits. Despite of our small sample size, we identify 10 loci at genome-wide significance ( P < 5 × 10 −8 ), including eight well-established loci. The two novel loci, both associated with glycated haemoglobin levels, are GPATCH8-SLC4A1 (rs369762319, P = 2.56 × 10 −12 ) and ROR2 (rs1201042, P = 3.24 × 10 −8 ). Finally, using summary statistics from UK Biobank and Biobank Japan, we show that polygenic risk prediction can be significantly improved for six out of nine traits by incorporating off-target data ( P < 0.01).Abstract: Whole-exome sequencing (WES) has been widely used to study the role of protein-coding variants in genetic diseases. Non-coding regions, typically covered by sparse off-target data, are often discarded by conventional WES analyses. Here, we develop a genotype calling pipeline named WEScall to analyse both target and off-target data. We leverage linkage disequilibrium shared within study samples and from an external reference panel to improve genotyping accuracy. In an application to WES of 2527 Chinese and Malays, WEScall can reduce the genotype discordance rate from 0.26% (SE= 6.4 × 10 −6 ) to 0.08% (SE = 3.6 × 10 −6 ) across 1.1 million single nucleotide polymorphisms (SNPs) in the deeply sequenced target regions. Furthermore, we obtain genotypes at 0.70% (SE = 3.0 × 10 −6 ) discordance rate across 5.2 million off-target SNPs, which had ~1.2× mean sequencing depth. Using this dataset, we perform genome-wide association studies of 10 metabolic traits. Despite of our small sample size, we identify 10 loci at genome-wide significance ( P < 5 × 10 −8 ), including eight well-established loci. The two novel loci, both associated with glycated haemoglobin levels, are GPATCH8-SLC4A1 (rs369762319, P = 2.56 × 10 −12 ) and ROR2 (rs1201042, P = 3.24 × 10 −8 ). Finally, using summary statistics from UK Biobank and Biobank Japan, we show that polygenic risk prediction can be significantly improved for six out of nine traits by incorporating off-target data ( P < 0.01). These results demonstrate WEScall as a useful tool to facilitate WES studies with decent amounts of off-target data. … (more)
- Is Part Of:
- Briefings in bioinformatics. Volume 22:Issue 3(2021)
- Journal:
- Briefings in bioinformatics
- Issue:
- Volume 22:Issue 3(2021)
- Issue Display:
- Volume 22, Issue 3 (2021)
- Year:
- 2021
- Volume:
- 22
- Issue:
- 3
- Issue Sort Value:
- 2021-0022-0003-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-06-17
- Subjects:
- whole-exome sequencing -- linkage disequilibrium -- low-coverage off-target data -- genome-wide association study -- polygenic risk score
Genetics -- Data processing -- Periodicals
Molecular biology -- Data processing -- Periodicals
Genomes -- Data processing -- Periodicals
572.80285 - Journal URLs:
- http://bib.oxfordjournals.org ↗
http://www.oxfordjournals.org/content?genre=journal&issn=1477-4054 ↗
http://ukcatalogue.oup.com/ ↗
http://firstsearch.oclc.org ↗ - DOI:
- 10.1093/bib/bbaa084 ↗
- Languages:
- English
- ISSNs:
- 1467-5463
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 2283.958363
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 24960.xml