"Escalibur"—A practical pipeline for the de novo analysis of nucleotide variation in nonmodel eukaryotes. (2nd March 2022)
- Record Type:
- Journal Article
- Title:
- "Escalibur"—A practical pipeline for the de novo analysis of nucleotide variation in nonmodel eukaryotes. (2nd March 2022)
- Main Title:
- "Escalibur"—A practical pipeline for the de novo analysis of nucleotide variation in nonmodel eukaryotes
- Authors:
- Korhonen, Pasi K.
Shaban, Babak
Faux, Noel G.
Kinkar, Liina
Chang, Bill C. H.
Wang, Daxi
Yang, Bicheng
Young, Neil D.
Gasser, Robin B. - Abstract:
- Abstract: The revolution in genomics has enabled large‐scale population genetic investigations of a wide range of organisms, but there has been a relatively limited focus on improving analytical pipelines. To efficiently analyse large data sets, highly integrated and automated software pipelines, which are easy to use, efficient, reliable, reproducible and run in multiple computational environments, are required. A number of software workflows have been developed to handle and process such data sets for population genetic analyses, but effective, specialized pipelines for genetic and statistical analyses of nonmodel organisms are lacking. For most species, resources for variomes (sets of genetic variations found in populations of species) are not available, and/or genome assemblies are often incomplete and fragmented, complicating the selection of the most suitable reference genome when multiple assemblies are available. Additionally, the biological samples used often contain extraneous DNA from sources other than the species under investigation (e.g., microbial contamination), which needs to be removed prior to genetic analyses. For these reasons, we established a new pipeline, called Escalibur, which includes: functionalities, such as data trimming and mapping; selection of a suitable reference genome; removal of contaminating read data; recalibration of base calls; and variant‐calling. Escalibur uses a proven gatk variant caller and workflow description language (WDL),Abstract: The revolution in genomics has enabled large‐scale population genetic investigations of a wide range of organisms, but there has been a relatively limited focus on improving analytical pipelines. To efficiently analyse large data sets, highly integrated and automated software pipelines, which are easy to use, efficient, reliable, reproducible and run in multiple computational environments, are required. A number of software workflows have been developed to handle and process such data sets for population genetic analyses, but effective, specialized pipelines for genetic and statistical analyses of nonmodel organisms are lacking. For most species, resources for variomes (sets of genetic variations found in populations of species) are not available, and/or genome assemblies are often incomplete and fragmented, complicating the selection of the most suitable reference genome when multiple assemblies are available. Additionally, the biological samples used often contain extraneous DNA from sources other than the species under investigation (e.g., microbial contamination), which needs to be removed prior to genetic analyses. For these reasons, we established a new pipeline, called Escalibur, which includes: functionalities, such as data trimming and mapping; selection of a suitable reference genome; removal of contaminating read data; recalibration of base calls; and variant‐calling. Escalibur uses a proven gatk variant caller and workflow description language (WDL), and is, therefore, a highly efficient and scalable pipeline for the genome‐wide identification of nucleotide variation in eukaryotes. This pipeline is available at https://gitlab.unimelb.edu.au/bioscience/escalibur (version 0.3‐beta) and is essentially applicable to any prokaryote or eukaryote. … (more)
- Is Part Of:
- Molecular ecology resources. Volume 22:Number 5(2022)
- Journal:
- Molecular ecology resources
- Issue:
- Volume 22:Number 5(2022)
- Issue Display:
- Volume 22, Issue 5 (2022)
- Year:
- 2022
- Volume:
- 22
- Issue:
- 5
- Issue Sort Value:
- 2022-0022-0005-0000
- Page Start:
- 2120
- Page End:
- 2126
- Publication Date:
- 2022-03-02
- Subjects:
- bioinfomatics/workflows -- molecular evolution -- parasitology -- population genetics
Molecular ecology -- Periodicals
572.8 - Journal URLs:
- http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)1755-0998 ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1111/1755-0998.13600 ↗
- Languages:
- English
- ISSNs:
- 1755-098X
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 5900.817368
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 21778.xml