NtEdit+Sealer: Efficient Targeted Error Resolution and Automated Finishing of Long‐Read Genome Assemblies. Issue 5 (14th May 2022)
- Record Type:
- Journal Article
- Title:
- NtEdit+Sealer: Efficient Targeted Error Resolution and Automated Finishing of Long‐Read Genome Assemblies. Issue 5 (14th May 2022)
- Main Title:
- NtEdit+Sealer: Efficient Targeted Error Resolution and Automated Finishing of Long‐Read Genome Assemblies
- Authors:
- Li, Janet X.
Coombe, Lauren
Wong, Johnathan
Birol, Inanç
Warren, René L. - Abstract:
- Abstract: High‐quality genome assemblies are crucial to many biological studies, and utilizing long sequencing reads can help achieve higher assembly contiguity. While long reads can resolve complex and repetitive regions of a genome, their relatively high associated error rates are still a major limitation. Long reads generally produce draft genome assemblies with lower base quality, which must be corrected with a genome polishing step. Hybrid genome polishing solutions can greatly improve the quality of long‐read genome assemblies by utilizing more accurate short reads to validate bases and correct errors. Currently available hybrid polishing methods rely on read alignments, and are therefore memory‐intensive and do not scale well to large genomes. Here we describe ntEdit+Sealer, an alignment‐free, k ‐mer‐based genome finishing protocol that employs memory‐efficient Bloom filters. The protocol includes ntEdit for correcting base errors and small indels, and for marking potentially problematic regions, then Sealer for filling both assembly gaps and problematic regions flagged by ntEdit. ntEdit+Sealer produces highly accurate, error‐corrected genome assemblies, and is available as a Makefile pipeline from https://github.com/bcgsc/ntedit_sealer_protocol . © 2022 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol : Automated long‐read genome finishing with short reads Support Protocol : Selecting optimal values for k‐ mer lengths ( k ) and BloomAbstract: High‐quality genome assemblies are crucial to many biological studies, and utilizing long sequencing reads can help achieve higher assembly contiguity. While long reads can resolve complex and repetitive regions of a genome, their relatively high associated error rates are still a major limitation. Long reads generally produce draft genome assemblies with lower base quality, which must be corrected with a genome polishing step. Hybrid genome polishing solutions can greatly improve the quality of long‐read genome assemblies by utilizing more accurate short reads to validate bases and correct errors. Currently available hybrid polishing methods rely on read alignments, and are therefore memory‐intensive and do not scale well to large genomes. Here we describe ntEdit+Sealer, an alignment‐free, k ‐mer‐based genome finishing protocol that employs memory‐efficient Bloom filters. The protocol includes ntEdit for correcting base errors and small indels, and for marking potentially problematic regions, then Sealer for filling both assembly gaps and problematic regions flagged by ntEdit. ntEdit+Sealer produces highly accurate, error‐corrected genome assemblies, and is available as a Makefile pipeline from https://github.com/bcgsc/ntedit_sealer_protocol . © 2022 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol : Automated long‐read genome finishing with short reads Support Protocol : Selecting optimal values for k‐ mer lengths ( k ) and Bloom filter size ( b ) … (more)
- Is Part Of:
- Current protocols. Volume 2:Issue 5(2022)
- Journal:
- Current protocols
- Issue:
- Volume 2:Issue 5(2022)
- Issue Display:
- Volume 2, Issue 5 (2022)
- Year:
- 2022
- Volume:
- 2
- Issue:
- 5
- Issue Sort Value:
- 2022-0002-0005-0000
- Page Start:
- n/a
- Page End:
- n/a
- Publication Date:
- 2022-05-14
- Subjects:
- assembly finishing -- Bloom filter -- hybrid assembly polishing -- k‐mer -- long‐read genome assembly
Life sciences -- Laboratory manuals -- Periodicals
Biology -- Laboratory manuals -- Periodicals
Life sciences -- Technique -- Periodicals
Biology -- Technique -- Periodicals
570.028 - Journal URLs:
- https://currentprotocols.onlinelibrary.wiley.com/journal/26911299 ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1002/cpz1.442 ↗
- Languages:
- English
- ISSNs:
- 2691-1299
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 21790.xml