Factorial estimating assembly base errors using k-mer abundance difference (KAD) between short reads and genome assembled sequences. Issue 3 (21st September 2020)
- Record Type:
- Journal Article
- Title:
- Factorial estimating assembly base errors using k-mer abundance difference (KAD) between short reads and genome assembled sequences. Issue 3 (21st September 2020)
- Main Title:
- Factorial estimating assembly base errors using k-mer abundance difference (KAD) between short reads and genome assembled sequences
- Authors:
- He, Cheng
Lin, Guifang
Wei, Hairong
Tang, Haibao
White, Frank F
Valent, Barbara
Liu, Sanzhen - Abstract:
- Abstract: Genome sequences provide genomic maps with a single-base resolution for exploring genetic contents. Sequencing technologies, particularly long reads, have revolutionized genome assemblies for producing highly continuous genome sequences. However, current long-read sequencing technologies generate inaccurate reads that contain many errors. Some errors are retained in assembled sequences, which are typically not completely corrected by using either long reads or more accurate short reads. The issue commonly exists, but few tools are dedicated for computing error rates or determining error locations. In this study, we developed a novel approach, referred to as k -mer abundance difference (KAD), to compare the inferred copy number of each k -mer indicated by short reads and the observed copy number in the assembly. Simple KAD metrics enable to classify k -mers into categories that reflect the quality of the assembly. Specifically, the KAD method can be used to identify base errors and estimate the overall error rate. In addition, sequence insertion and deletion as well as sequence redundancy can also be detected. Collectively, KAD is valuable for quality evaluation of genome assemblies and, potentially, provides a diagnostic tool to aid in precise error correction. KAD software has been developed to facilitate public uses.
- Is Part Of:
- NAR genomics and bioinformatics. Volume 2:Issue 3(2020)
- Journal:
- NAR genomics and bioinformatics
- Issue:
- Volume 2:Issue 3(2020)
- Issue Display:
- Volume 2, Issue 3 (2020)
- Year:
- 2020
- Volume:
- 2
- Issue:
- 3
- Issue Sort Value:
- 2020-0002-0003-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-09-21
- Subjects:
- Genomics -- Periodicals
Bioinformatics -- Periodicals
572.8 - Journal URLs:
- http://www.oxfordjournals.org/ ↗
https://academic.oup.com/nargab ↗ - DOI:
- 10.1093/nargab/lqaa075 ↗
- Languages:
- English
- ISSNs:
- 2631-9268
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 15537.xml