Group-based variant calling leveraging next-generation supercomputing for large-scale whole-genome sequencing studies. Issue 1 (December 2015)
- Record Type:
- Journal Article
- Title:
- Group-based variant calling leveraging next-generation supercomputing for large-scale whole-genome sequencing studies. Issue 1 (December 2015)
- Main Title:
- Group-based variant calling leveraging next-generation supercomputing for large-scale whole-genome sequencing studies
- Authors:
- Standish, Kristopher
Carland, Tristan
Lockwood, Glenn
Pfeiffer, Wayne
Tatineni, Mahidhar
Huang, C
Lamberth, Sarah
Cherkas, Yauheniya
Brodmerkel, Carrie
Jaeger, Ed
Smith, Lance
Rajagopal, Gunaretnam
Curran, Mark
Schork, Nicholas - Abstract:
- Abstract Motivation Next-generation sequencing (NGS) technologies have become much more efficient, allowing whole human genomes to be sequenced faster and cheaper than ever before. However, processing the raw sequence reads associated with NGS technologies requires care and sophistication in order to draw compelling inferences about phenotypic consequences of variation in human genomes. It has been shown that different approaches to variant calling from NGS data can lead to different conclusions. Ensuring appropriate accuracy and quality in variant calling can come at a computational cost. Results We describe our experience implementing and evaluating a group-based approach to calling variants on large numbers of whole human genomes. We explore the influence of many factors that may impact the accuracy and efficiency of group-based variant calling, including group size, the biogeographical backgrounds of the individuals who have been sequenced, and the computing environment used. We make efficient use of the Gordon supercomputer cluster at the San Diego Supercomputer Center by incorporating job-packing and parallelization considerations into our workflow while calling variants on 437 whole human genomes generated as part of large association study. Conclusions We ultimately find that our workflow resulted in high-quality variant calls in a computationally efficient manner. We argue that studies like ours should motivate further investigations combining hardware-orientedAbstract Motivation Next-generation sequencing (NGS) technologies have become much more efficient, allowing whole human genomes to be sequenced faster and cheaper than ever before. However, processing the raw sequence reads associated with NGS technologies requires care and sophistication in order to draw compelling inferences about phenotypic consequences of variation in human genomes. It has been shown that different approaches to variant calling from NGS data can lead to different conclusions. Ensuring appropriate accuracy and quality in variant calling can come at a computational cost. Results We describe our experience implementing and evaluating a group-based approach to calling variants on large numbers of whole human genomes. We explore the influence of many factors that may impact the accuracy and efficiency of group-based variant calling, including group size, the biogeographical backgrounds of the individuals who have been sequenced, and the computing environment used. We make efficient use of the Gordon supercomputer cluster at the San Diego Supercomputer Center by incorporating job-packing and parallelization considerations into our workflow while calling variants on 437 whole human genomes generated as part of large association study. Conclusions We ultimately find that our workflow resulted in high-quality variant calls in a computationally efficient manner. We argue that studies like ours should motivate further investigations combining hardware-oriented advances in computing systems with algorithmic developments to tackle emerging 'big data' problems in biomedical research brought on by the expansion of NGS technologies. … (more)
- Is Part Of:
- BMC bioinformatics. Volume 16:Issue 1(2015)
- Journal:
- BMC bioinformatics
- Issue:
- Volume 16:Issue 1(2015)
- Issue Display:
- Volume 16, Issue 1 (2015)
- Year:
- 2015
- Volume:
- 16
- Issue:
- 1
- Issue Sort Value:
- 2015-0016-0001-0000
- Page Start:
- 1
- Page End:
- 14
- Publication Date:
- 2015-12
- Subjects:
- Variant calling -- Supercomputing -- Whole-genome sequencing
Bioinformatics -- Periodicals
Computational biology -- Periodicals
570.285 - Journal URLs:
- http://www.biomedcentral.com/bmcbioinformatics/ ↗
http://www.pubmedcentral.nih.gov/tocrender.fcgi?journal=13 ↗
http://link.springer.com/ ↗ - DOI:
- 10.1186/s12859-015-0736-4 ↗
- Languages:
- English
- ISSNs:
- 1471-2105
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - Digital store
British Library HMNTS - ELD Digital store - Ingest File:
- 9956.xml