Jabba: hybrid error correction for long sequencing reads. Issue 1 (December 2016)
- Record Type:
- Journal Article
- Title:
- Jabba: hybrid error correction for long sequencing reads. Issue 1 (December 2016)
- Main Title:
- Jabba: hybrid error correction for long sequencing reads
- Authors:
- Miclotte, Giles
Heydari, Mahdi
Demeester, Piet
Rombauts, Stephane
Van de Peer, Yves
Audenaert, Pieter
Fostier, Jan - Abstract:
- Abstract Background Third generation sequencing platforms produce longer reads with higher error rates than second generation technologies. While the improved read length can provide useful information for downstream analysis, underlying algorithms are challenged by the high error rate. Error correction methods in which accurate short reads are used to correct noisy long reads appear to be attractive to generate high-quality long reads. Methods that align short reads to long reads do not optimally use the information contained in the second generation data, and suffer from large runtimes. Recently, a new hybrid error correcting method has been proposed, where the second generation data is first assembled into a de Bruijn graph, on which the long reads are then aligned. Results In this context we present Jabba, a hybrid method to correct long third generation reads by mapping them on a corrected de Bruijn graph that was constructed from second generation data. Unique to our method is the use of a pseudo alignment approach with a seed-and-extend methodology, using maximal exact matches (MEMs) as seeds. In addition to benchmark results, certain theoretical results concerning the possibilities and limitations of the use of MEMs in the context of third generation reads are presented. Conclusion Jabba produces highly reliable corrected reads: almost all corrected reads align to the reference, and these alignments have a very high identity. Many of the aligned reads are error-free.Abstract Background Third generation sequencing platforms produce longer reads with higher error rates than second generation technologies. While the improved read length can provide useful information for downstream analysis, underlying algorithms are challenged by the high error rate. Error correction methods in which accurate short reads are used to correct noisy long reads appear to be attractive to generate high-quality long reads. Methods that align short reads to long reads do not optimally use the information contained in the second generation data, and suffer from large runtimes. Recently, a new hybrid error correcting method has been proposed, where the second generation data is first assembled into a de Bruijn graph, on which the long reads are then aligned. Results In this context we present Jabba, a hybrid method to correct long third generation reads by mapping them on a corrected de Bruijn graph that was constructed from second generation data. Unique to our method is the use of a pseudo alignment approach with a seed-and-extend methodology, using maximal exact matches (MEMs) as seeds. In addition to benchmark results, certain theoretical results concerning the possibilities and limitations of the use of MEMs in the context of third generation reads are presented. Conclusion Jabba produces highly reliable corrected reads: almost all corrected reads align to the reference, and these alignments have a very high identity. Many of the aligned reads are error-free. Additionally, Jabba corrects reads using a very low amount of CPU time. From this we conclude that pseudo alignment with MEMs is a fast and reliable method to map long highly erroneous sequences on a de Bruijn graph. … (more)
- Is Part Of:
- Algorithms for molecular biology. Volume 11:Issue 1(2016)
- Journal:
- Algorithms for molecular biology
- Issue:
- Volume 11:Issue 1(2016)
- Issue Display:
- Volume 11, Issue 1 (2016)
- Year:
- 2016
- Volume:
- 11
- Issue:
- 1
- Issue Sort Value:
- 2016-0011-0001-0000
- Page Start:
- 1
- Page End:
- 12
- Publication Date:
- 2016-12
- Subjects:
- Sequence analysis -- Error correction -- de Bruijn graph -- Maximal exact matches
Molecular biology -- Mathematical models -- Periodicals
Algorithms -- Periodicals
Bioinformatics -- Periodicals
572.8015118 - Journal URLs:
- http://pubmedcentral.com/tocrender.fcgi?journal=403&action=archive ↗
http://www.almob.org/ ↗
http://link.springer.com/ ↗ - DOI:
- 10.1186/s13015-016-0075-7 ↗
- Languages:
- English
- ISSNs:
- 1748-7188
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 9845.xml