Coverage-Versus-Length Plots, a Simple Quality Control Step for de Novo Yeast Genome Sequence Assemblies. Issue 3 (1st March 2019)
- Record Type:
- Journal Article
- Title:
- Coverage-Versus-Length Plots, a Simple Quality Control Step for de Novo Yeast Genome Sequence Assemblies. Issue 3 (1st March 2019)
- Main Title:
- Coverage-Versus-Length Plots, a Simple Quality Control Step for de Novo Yeast Genome Sequence Assemblies
- Authors:
- Douglass, Alexander P
O'Brien, Caoimhe E
Offei, Benjamin
Coughlan, Aisling Y
Ortiz-Merino, Raúl A
Butler, Geraldine
Byrne, Kevin P
Wolfe, Kenneth H - Abstract:
- Abstract: Illumina sequencing has revolutionized yeast genomics, with prices for commercial draft genome sequencing now below $200. The popular SPAdes assembler makes it simple to generate a de novo genome assembly for any yeast species. However, whereas making genome assemblies has become routine, understanding what they contain is still challenging. Here, we show how graphing the information that SPAdes provides about the length and coverage of each scaffold can be used to investigate the nature of an assembly, and to diagnose possible problems. Scaffolds derived from mitochondrial DNA, ribosomal DNA, and yeast plasmids can be identified by their high coverage. Contaminating data, such as cross-contamination from other samples in a multiplex sequencing run, can be identified by its low coverage. Scaffolds derived from the bacteriophage PhiX174 and Lambda DNAs that are frequently used as molecular standards in Illumina protocols can also be detected. Assemblies of yeast genomes with high heterozygosity, such as interspecies hybrids, often contain two types of scaffold: regions of the genome where the two alleles assembled into two separate scaffolds and each has a coverage level C, and regions where the two alleles co-assembled (collapsed) into a single scaffold that has a coverage level 2 C . Visualizing the data with Coverage- vs. -Length (CVL) plots, which can be done using Microsoft Excel or Google Sheets, provides a simple method to understand the structure of a genomeAbstract: Illumina sequencing has revolutionized yeast genomics, with prices for commercial draft genome sequencing now below $200. The popular SPAdes assembler makes it simple to generate a de novo genome assembly for any yeast species. However, whereas making genome assemblies has become routine, understanding what they contain is still challenging. Here, we show how graphing the information that SPAdes provides about the length and coverage of each scaffold can be used to investigate the nature of an assembly, and to diagnose possible problems. Scaffolds derived from mitochondrial DNA, ribosomal DNA, and yeast plasmids can be identified by their high coverage. Contaminating data, such as cross-contamination from other samples in a multiplex sequencing run, can be identified by its low coverage. Scaffolds derived from the bacteriophage PhiX174 and Lambda DNAs that are frequently used as molecular standards in Illumina protocols can also be detected. Assemblies of yeast genomes with high heterozygosity, such as interspecies hybrids, often contain two types of scaffold: regions of the genome where the two alleles assembled into two separate scaffolds and each has a coverage level C, and regions where the two alleles co-assembled (collapsed) into a single scaffold that has a coverage level 2 C . Visualizing the data with Coverage- vs. -Length (CVL) plots, which can be done using Microsoft Excel or Google Sheets, provides a simple method to understand the structure of a genome assembly and detect aberrant scaffolds or contigs. We provide a Python script that allows assemblies to be filtered to remove contaminants identified in CVL plots. … (more)
- Is Part Of:
- G3. Volume 9:Issue 3(2019)
- Journal:
- G3
- Issue:
- Volume 9:Issue 3(2019)
- Issue Display:
- Volume 9, Issue 3 (2019)
- Year:
- 2019
- Volume:
- 9
- Issue:
- 3
- Issue Sort Value:
- 2019-0009-0003-0000
- Page Start:
- 879
- Page End:
- 887
- Publication Date:
- 2019-03-01
- Subjects:
- genomics -- genome assembly -- bioinformatics -- yeast
Genetics -- Research -- Periodicals
Genomics -- Periodicals
Genetics
Genomics
Genes
Genetics -- Research
Genomics
Electronic journals
Periodical
Periodicals
Fulltext
Internet Resources
Periodicals
572.8 - Journal URLs:
- https://academic.oup.com/g3journal ↗
http://bibpurl.oclc.org/web/43467 ↗
http://www.g3journal.org ↗
http://www.oxfordjournals.org/ ↗ - DOI:
- 10.1534/g3.118.200745 ↗
- Languages:
- English
- ISSNs:
- 2160-1836
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 22163.xml