Accurate inference of isoforms from multiple sample RNA-Seq data. (December 2015)
- Record Type:
- Journal Article
- Title:
- Accurate inference of isoforms from multiple sample RNA-Seq data. (December 2015)
- Main Title:
- Accurate inference of isoforms from multiple sample RNA-Seq data
- Authors:
- Tasnim, Masruba
Ma, Shining
Yang, Ei-Wen
Jiang, Tao
Li, Wei - Abstract:
- Abstract Background RNA-Seq based transcriptome assembly has become a fundamental technique for studying expressed mRNAs (i.e ., transcripts or isoforms) in a cell using high-throughput sequencing technologies, and is serving as a basis to analyze the structural and quantitative differences of expressed isoforms between samples. However, the current transcriptome assembly algorithms are not specifically designed to handle large amounts of errors that are inherent in real RNA-Seq datasets, especially those involving multiple samples, making downstream differential analysis applications difficult. On the other hand, multiple sample RNA-Seq datasets may provide more information than single sample datasets that can be utilized to improve the performance of transcriptome assembly and abundance estimation, but such information remains overlooked by the existing assembly tools. Results We formulate a computational framework of transcriptome assembly that is capable of handling noisy RNA-Seq reads and multiple sample RNA-Seq datasets efficiently. We show that finding an optimal solution under this framework is an NP-hard problem. Instead, we develop an efficient heuristic algorithm, called Iterative Shortest Path (ISP), based on linear programming (LP) and integer linear programming (ILP). Our preliminary experimental results on both simulated and real datasets and comparison with the existing assembly tools demonstrate that (i) the ISP algorithm is able to assemble transcriptomesAbstract Background RNA-Seq based transcriptome assembly has become a fundamental technique for studying expressed mRNAs (i.e ., transcripts or isoforms) in a cell using high-throughput sequencing technologies, and is serving as a basis to analyze the structural and quantitative differences of expressed isoforms between samples. However, the current transcriptome assembly algorithms are not specifically designed to handle large amounts of errors that are inherent in real RNA-Seq datasets, especially those involving multiple samples, making downstream differential analysis applications difficult. On the other hand, multiple sample RNA-Seq datasets may provide more information than single sample datasets that can be utilized to improve the performance of transcriptome assembly and abundance estimation, but such information remains overlooked by the existing assembly tools. Results We formulate a computational framework of transcriptome assembly that is capable of handling noisy RNA-Seq reads and multiple sample RNA-Seq datasets efficiently. We show that finding an optimal solution under this framework is an NP-hard problem. Instead, we develop an efficient heuristic algorithm, called Iterative Shortest Path (ISP), based on linear programming (LP) and integer linear programming (ILP). Our preliminary experimental results on both simulated and real datasets and comparison with the existing assembly tools demonstrate that (i) the ISP algorithm is able to assemble transcriptomes with a greatly increased precision while keeping the same level of sensitivity, especially when many samples are involved, and (ii) its assembly results help improve downstream differential analysis. The source code of ISP is freely available athttp://alumni.cs.ucr.edu/~liw/isp.html . … (more)
- Is Part Of:
- BMC genomics. Volume 16:Number 2(2015)
- Journal:
- BMC genomics
- Issue:
- Volume 16:Number 2(2015)
- Issue Display:
- Volume 16, Issue 2 (2015)
- Year:
- 2015
- Volume:
- 16
- Issue:
- 2
- Issue Sort Value:
- 2015-0016-0002-0000
- Page Start:
- 1
- Page End:
- 12
- Publication Date:
- 2015-12
- Subjects:
- Genomes -- Periodicals
Gene mapping -- Periodicals
Genomics -- Periodicals
Base Sequence -- Periodicals
Chromosome Mapping -- Periodicals
Genetic Techniques -- Periodicals
Sequence Analysis, DNA -- Periodicals
572.8605 - Journal URLs:
- http://www.biomedcentral.com/bmcgenomics/ ↗
http://www.pubmedcentral.nih.gov/tocrender.fcgi?journal=32 ↗
http://link.springer.com/ ↗ - DOI:
- 10.1186/1471-2164-16-S2-S15 ↗
- Languages:
- English
- ISSNs:
- 1471-2164
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 9828.xml