Advancing next‐generation sequencing data analytics with scalable distributed infrastructure. (19th June 2013)
- Record Type:
- Journal Article
- Title:
- Advancing next‐generation sequencing data analytics with scalable distributed infrastructure. (19th June 2013)
- Main Title:
- Advancing next‐generation sequencing data analytics with scalable distributed infrastructure
- Authors:
- Kim, Joohyun
Maddineni, Sharath
Jha, Shantenu
Qiu, Judy
Foster, Ian
Taylor, Ronald
Loidl, Hans‐Wolfgang
Singer, Jeremy - Abstract:
- <abstract abstract-type="main" id="cpe3013-abs-0001"> <title>SUMMARY</title> <p id="cpe3013-para-0001">With the emergence of popular next‐generation sequencing (NGS)‐based genome‐wide protocols such as chromatin immunoprecipitation followed by sequencing (ChIP‐Seq) and RNA‐Seq, there is a growing need for research and infrastructure to support the requirement of effectively analyzing NGS data. Such research and infrastructure do not replace but complement algorithmic advances developments in analyzing NGS data. We present a runtime environment, Distributed Application Runtime Environment, that supports the scalable, flexible, and extensible composition of capabilities that cover the primary requirements of NGS‐based analytics. In this work, we use BFAST as a representative stand‐alone tool used for NGS data analysis and a ChIP‐Seq pipeline as a representative pipeline‐based approach to analyze the computational requirements. We analyze the performance characteristics of BFAST and understand its dependency on different input parameters. The computational complexity of genome‐wide mapping using BFAST, amongst other factors, depends upon the size of a reference genome and the data size of short reads. Characterizing the performance suggests that the mapping benefits from both scaling‐up (increased fine‐grained parallelism) and scaling‐out (task‐level parallelism – local and distributed). For certain problem instances, scaling‐out can be a more efficient approach than<abstract abstract-type="main" id="cpe3013-abs-0001"> <title>SUMMARY</title> <p id="cpe3013-para-0001">With the emergence of popular next‐generation sequencing (NGS)‐based genome‐wide protocols such as chromatin immunoprecipitation followed by sequencing (ChIP‐Seq) and RNA‐Seq, there is a growing need for research and infrastructure to support the requirement of effectively analyzing NGS data. Such research and infrastructure do not replace but complement algorithmic advances developments in analyzing NGS data. We present a runtime environment, Distributed Application Runtime Environment, that supports the scalable, flexible, and extensible composition of capabilities that cover the primary requirements of NGS‐based analytics. In this work, we use BFAST as a representative stand‐alone tool used for NGS data analysis and a ChIP‐Seq pipeline as a representative pipeline‐based approach to analyze the computational requirements. We analyze the performance characteristics of BFAST and understand its dependency on different input parameters. The computational complexity of genome‐wide mapping using BFAST, amongst other factors, depends upon the size of a reference genome and the data size of short reads. Characterizing the performance suggests that the mapping benefits from both scaling‐up (increased fine‐grained parallelism) and scaling‐out (task‐level parallelism – local and distributed). For certain problem instances, scaling‐out can be a more efficient approach than scaling‐up. On the basis of investigations using the pipeline for ChIP‐Seq, we also discuss the importance of dynamical execution of tasks. Copyright © 2013 John Wiley &amp; Sons, Ltd.</p> </abstract> … (more)
- Is Part Of:
- Concurrency and computation. Volume 26:Number 4(2014:Mar.)
- Journal:
- Concurrency and computation
- Issue:
- Volume 26:Number 4(2014:Mar.)
- Issue Display:
- Volume 26, Issue 4 (2014)
- Year:
- 2014
- Volume:
- 26
- Issue:
- 4
- Issue Sort Value:
- 2014-0026-0004-0000
- Page Start:
- 894
- Page End:
- 906
- Publication Date:
- 2013-06-19
- Subjects:
- Parallel processing (Electronic computers) -- Periodicals
Parallel computers -- Periodicals
004.35 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
- DOI:
- 10.1002/cpe.3013 ↗
- Languages:
- English
- ISSNs:
- 1532-0626
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3405.622000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 4363.xml