The draft nuclear genome assembly of Eucalyptus pauciflora: a pipeline for comparing de novo assemblies. Issue 1 (2nd January 2020)
- Record Type:
- Journal Article
- Title:
- The draft nuclear genome assembly of Eucalyptus pauciflora: a pipeline for comparing de novo assemblies. Issue 1 (2nd January 2020)
- Main Title:
- The draft nuclear genome assembly of Eucalyptus pauciflora: a pipeline for comparing de novo assemblies
- Authors:
- Wang, Weiwen
Das, Ashutosh
Kainer, David
Schalamun, Miriam
Morales-Suarez, Alejandro
Schwessinger, Benjamin
Lanfear, Robert - Abstract:
- Abstract: Background: Eucalyptus pauciflora (the snow gum) is a long-lived tree with high economic and ecological importance. Currently, little genomic information for E. pauciflora is available. Here, we sequentially assemble the genome of Eucalyptus pauciflora with different methods, and combine multiple existing and novel approaches to help to select the best genome assembly. Findings: We generated high coverage of long- (Nanopore, 174×) and short- (Illumina, 228×) read data from a single E. pauciflora individual and compared assemblies from 5 assemblers (Canu, SMARTdenovo, Flye, Marvel, and MaSuRCA) with different read lengths (1 and 35 kb minimum read length). A key component of our approach is to keep a randomly selected collection of ∼10% of both long and short reads separated from the assemblies to use as a validation set for assessing assemblies. Using this validation set along with a range of existing tools, we compared the assemblies in 8 ways: contig N50, BUSCO scores, LAI (long terminal repeat assembly index) scores, assembly ploidy, base-level error rate, CGAL (computing genome assembly likelihoods) scores, structural variation, and genome sequence similarity. Our result showed that MaSuRCA generated the best assembly, which is 594.87 Mb in size, with a contig N50 of 3.23 Mb, and an estimated error rate of ∼0.006 errors per base. Conclusions: We report a draft genome of E. pauciflora, which will be a valuable resource for further genomic studies of eucalypts.Abstract: Background: Eucalyptus pauciflora (the snow gum) is a long-lived tree with high economic and ecological importance. Currently, little genomic information for E. pauciflora is available. Here, we sequentially assemble the genome of Eucalyptus pauciflora with different methods, and combine multiple existing and novel approaches to help to select the best genome assembly. Findings: We generated high coverage of long- (Nanopore, 174×) and short- (Illumina, 228×) read data from a single E. pauciflora individual and compared assemblies from 5 assemblers (Canu, SMARTdenovo, Flye, Marvel, and MaSuRCA) with different read lengths (1 and 35 kb minimum read length). A key component of our approach is to keep a randomly selected collection of ∼10% of both long and short reads separated from the assemblies to use as a validation set for assessing assemblies. Using this validation set along with a range of existing tools, we compared the assemblies in 8 ways: contig N50, BUSCO scores, LAI (long terminal repeat assembly index) scores, assembly ploidy, base-level error rate, CGAL (computing genome assembly likelihoods) scores, structural variation, and genome sequence similarity. Our result showed that MaSuRCA generated the best assembly, which is 594.87 Mb in size, with a contig N50 of 3.23 Mb, and an estimated error rate of ∼0.006 errors per base. Conclusions: We report a draft genome of E. pauciflora, which will be a valuable resource for further genomic studies of eucalypts. The approaches for assessing and comparing genomes should help in assessing and choosing among many potential genome assemblies from a single dataset. … (more)
- Is Part Of:
- GigaScience. Volume 9:Issue 1(2020)
- Journal:
- GigaScience
- Issue:
- Volume 9:Issue 1(2020)
- Issue Display:
- Volume 9, Issue 1 (2020)
- Year:
- 2020
- Volume:
- 9
- Issue:
- 1
- Issue Sort Value:
- 2020-0009-0001-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-01-02
- Subjects:
- long-read assembly -- nanopore sequencing -- hybrid assembly -- genome assessment -- assembly comparison -- Eucalyptus pauciflora -- haplotig separation -- genome polishing
Information storage and retrieval systems -- Research -- Periodicals
Biology -- Research -- Periodicals
Medical sciences -- Research -- Periodicals
Database management -- Periodicals
570.285 - Journal URLs:
- http://www.gigasciencejournal.com/ ↗
http://www.oxfordjournals.org/ ↗ - DOI:
- 10.1093/gigascience/giz160 ↗
- Languages:
- English
- ISSNs:
- 2047-217X
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 12539.xml