Informed kmer selection for de novo transcriptome assembly. (28th April 2016)
- Record Type:
- Journal Article
- Title:
- Informed kmer selection for de novo transcriptome assembly. (28th April 2016)
- Main Title:
- Informed kmer selection for de novo transcriptome assembly
- Authors:
- Durai, Dilip A.
Schulz, Marcel H. - Abstract:
- Abstract : Motivation: De novo transcriptome assembly is an integral part for many RNA-seq workflows. Common applications include sequencing of non-model organisms, cancer or meta transcriptomes. Most de novo transcriptome assemblers use the de Bruijn graph (DBG) as the underlying data structure. The quality of the assemblies produced by such assemblers is highly influenced by the exact word length k . As such no single k mer value leads to optimal results. Instead, DBGs over different k mer values are built and the assemblies are merged to improve sensitivity. However, no studies have investigated thoroughly the problem of automatically learning at which k mer value to stop the assembly. Instead a suboptimal selection of k mer values is often used in practice. Results: Here we investigate the contribution of a single k mer value in a multi- k mer based assembly approach. We find that a comparative clustering of related assemblies can be used to estimate the importance of an additional k mer assembly. Using a model fit based algorithm we predict the k mer value at which no further assemblies are necessary. Our approach is tested with different de novo assemblers for datasets with different coverage values and read lengths. Further, we suggest a simple post processing step that significantly improves the quality of multi- k mer assemblies. Conclusion: We provide an automatic method for limiting the number of k mer values without a significant loss in assembly quality but withAbstract : Motivation: De novo transcriptome assembly is an integral part for many RNA-seq workflows. Common applications include sequencing of non-model organisms, cancer or meta transcriptomes. Most de novo transcriptome assemblers use the de Bruijn graph (DBG) as the underlying data structure. The quality of the assemblies produced by such assemblers is highly influenced by the exact word length k . As such no single k mer value leads to optimal results. Instead, DBGs over different k mer values are built and the assemblies are merged to improve sensitivity. However, no studies have investigated thoroughly the problem of automatically learning at which k mer value to stop the assembly. Instead a suboptimal selection of k mer values is often used in practice. Results: Here we investigate the contribution of a single k mer value in a multi- k mer based assembly approach. We find that a comparative clustering of related assemblies can be used to estimate the importance of an additional k mer assembly. Using a model fit based algorithm we predict the k mer value at which no further assemblies are necessary. Our approach is tested with different de novo assemblers for datasets with different coverage values and read lengths. Further, we suggest a simple post processing step that significantly improves the quality of multi- k mer assemblies. Conclusion: We provide an automatic method for limiting the number of k mer values without a significant loss in assembly quality but with savings in assembly time. This is a step forward to making multi- k mer methods more reliable and easier to use. Availability and Implementation :A general implementation of our approach can be found under: https://github.com/SchulzLab/KREATION . Supplementary information: Supplementary data are available at Bioinformatics online. Contact: mschulz@mmci.uni-saarland.de … (more)
- Is Part Of:
- Bioinformatics. Volume 32:Number 11(2016)
- Journal:
- Bioinformatics
- Issue:
- Volume 32:Number 11(2016)
- Issue Display:
- Volume 32, Issue 11 (2016)
- Year:
- 2016
- Volume:
- 32
- Issue:
- 11
- Issue Sort Value:
- 2016-0032-0011-0000
- Page Start:
- 1670
- Page End:
- 1677
- Publication Date:
- 2016-04-28
- Subjects:
- Bioinformatics -- Periodicals
Genomics -- Data processing -- Periodicals
Computational biology -- Periodicals
572.80285 - Journal URLs:
- http://bioinformatics.oxfordjournals.org ↗
http://firstsearch.oclc.org ↗
http://ukcatalogue.oup.com/ ↗ - DOI:
- 10.1093/bioinformatics/btw217 ↗
- Languages:
- English
- ISSNs:
- 1367-4803
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 2072.348000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 12387.xml