Fast and Robust Identity-by-Descent Inference with the Templated Positional Burrows–Wheeler Transform. (23rd December 2020)
- Record Type:
- Journal Article
- Title:
- Fast and Robust Identity-by-Descent Inference with the Templated Positional Burrows–Wheeler Transform. (23rd December 2020)
- Main Title:
- Fast and Robust Identity-by-Descent Inference with the Templated Positional Burrows–Wheeler Transform
- Authors:
- Freyman, William A
McManus, Kimberly F
Shringarpure, Suyash S
Jewett, Ethan M
Bryc, Katarzyna
Auton, Adam - Editors:
- Falush, Daniel
- Abstract:
- Abstract: Estimating the genomic location and length of identical-by-descent (IBD) segments among individuals is a crucial step in many genetic analyses. However, the exponential growth in the size of biobank and direct-to-consumer genetic data sets makes accurate IBD inference a significant computational challenge. Here we present the templated positional Burrows–Wheeler transform (TPBWT) to make fast IBD estimates robust to genotype and phasing errors. Using haplotype data simulated over pedigrees with realistic genotyping and phasing errors, we show that the TPBWT outperforms other state-of-the-art IBD inference algorithms in terms of speed and accuracy. For each phase-aware method, we explore the false positive and false negative rates of inferring IBD by segment length and characterize the types of error commonly found. Our results highlight the fragility of most phased IBD inference methods; the accuracy of IBD estimates can be highly sensitive to the quality of haplotype phasing. Additionally, we compare the performance of the TPBWT against a widely used phase-free IBD inference approach that is robust to phasing errors. We introduce both in-sample and out-of-sample TPBWT-based IBD inference algorithms and demonstrate their computational efficiency on massive-scale data sets with millions of samples. Furthermore, we describe the binary file format for TPBWT-compressed haplotypes that results in fast and efficient out-of-sample IBD computes against very large cohortAbstract: Estimating the genomic location and length of identical-by-descent (IBD) segments among individuals is a crucial step in many genetic analyses. However, the exponential growth in the size of biobank and direct-to-consumer genetic data sets makes accurate IBD inference a significant computational challenge. Here we present the templated positional Burrows–Wheeler transform (TPBWT) to make fast IBD estimates robust to genotype and phasing errors. Using haplotype data simulated over pedigrees with realistic genotyping and phasing errors, we show that the TPBWT outperforms other state-of-the-art IBD inference algorithms in terms of speed and accuracy. For each phase-aware method, we explore the false positive and false negative rates of inferring IBD by segment length and characterize the types of error commonly found. Our results highlight the fragility of most phased IBD inference methods; the accuracy of IBD estimates can be highly sensitive to the quality of haplotype phasing. Additionally, we compare the performance of the TPBWT against a widely used phase-free IBD inference approach that is robust to phasing errors. We introduce both in-sample and out-of-sample TPBWT-based IBD inference algorithms and demonstrate their computational efficiency on massive-scale data sets with millions of samples. Furthermore, we describe the binary file format for TPBWT-compressed haplotypes that results in fast and efficient out-of-sample IBD computes against very large cohort panels. Finally, we demonstrate the utility of the TPBWT in a brief empirical analysis, exploring geographic patterns of haplotype sharing within Mexico. Hierarchical clustering of IBD shared across regions within Mexico reveals geographically structured haplotype sharing and a strong signal of isolation by distance. Our software implementation of the TPBWT is freely available for noncommercial use in the code repository (https://github.com/23andMe/phasedibd, last accessed January 11, 2021 ). … (more)
- Is Part Of:
- Molecular biology and evolution. Volume 38:Number 5(2021)
- Journal:
- Molecular biology and evolution
- Issue:
- Volume 38:Number 5(2021)
- Issue Display:
- Volume 38, Issue 5 (2021)
- Year:
- 2021
- Volume:
- 38
- Issue:
- 5
- Issue Sort Value:
- 2021-0038-0005-0000
- Page Start:
- 2131
- Page End:
- 2151
- Publication Date:
- 2020-12-23
- Subjects:
- identity-by-descent -- templated positional Burrows–Wheeler transform -- population genetics
Molecular biology -- Periodicals
Molecular evolution -- Periodicals
Evolution, Molecular -- Periodicals
Molecular Biology -- Periodicals
572.8 - Journal URLs:
- http://mbe.oxfordjournals.org/ ↗
http://www.molbiolevol.org/ ↗
http://ukcatalogue.oup.com/ ↗
http://firstsearch.oclc.org ↗
http://firstsearch.oclc.org/journal=0737-7038;screen=info;ECOIP ↗ - DOI:
- 10.1093/molbev/msaa328 ↗
- Languages:
- English
- ISSNs:
- 0737-4038
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 5900.782000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 16780.xml