EDNAFlow, an automated, reproducible and scalable workflow for analysis of environmental DNA sequences exploiting Nextflow and Singularity. (9th March 2021)
- Record Type:
- Journal Article
- Title:
- EDNAFlow, an automated, reproducible and scalable workflow for analysis of environmental DNA sequences exploiting Nextflow and Singularity. (9th March 2021)
- Main Title:
- EDNAFlow, an automated, reproducible and scalable workflow for analysis of environmental DNA sequences exploiting Nextflow and Singularity
- Authors:
- Mousavi‐Derazmahalleh, Mahsa
Stott, Audrey
Lines, Rose
Peverley, Georgia
Nester, Georgia
Simpson, Tiffany
Zawierta, Michal
De La Pierre, Marco
Bunce, Michael
Christophersen, Claus T. - Abstract:
- Abstract: Metabarcoding of environmental DNA (eDNA) when coupled with high throughput sequencing is revolutionising the way biodiversity can be monitored across a wide range of applications. However, the large number of tools deployed in downstream bioinformatic analyses often places a challenge in configuration and maintenance of a workflow, and consequently limits the research reproducibility. Furthermore, scalability needs to be considered to handle the growing amount of data due to increase in sequence output and the scale of project. Here, we describe eDNAFlow, a fully automated workflow that employs a number of state‐of‐the‐art applications to process eDNA data from raw sequences (single‐end or paired‐end) to generation of curated and noncurated zero‐radius operational taxonomic units (ZOTUs) and their abundance tables. This pipeline is based on Nextflow and Singularity which enable a scalable, portable and reproducible workflow using software containers on a local computer, clouds and high‐performance computing (HPC) clusters. Finally, we present an in‐house Python script to assign taxonomy to ZOTUs based on user specified thresholds for assigning lowest common ancestor (LCA). We demonstrate the utility and efficiency of the pipeline using an example of a published coral diversity biomonitoring study. Our results were congruent with the aforementioned study. The scalability of the pipeline is also demonstrated through analysis of a large data set containing 154Abstract: Metabarcoding of environmental DNA (eDNA) when coupled with high throughput sequencing is revolutionising the way biodiversity can be monitored across a wide range of applications. However, the large number of tools deployed in downstream bioinformatic analyses often places a challenge in configuration and maintenance of a workflow, and consequently limits the research reproducibility. Furthermore, scalability needs to be considered to handle the growing amount of data due to increase in sequence output and the scale of project. Here, we describe eDNAFlow, a fully automated workflow that employs a number of state‐of‐the‐art applications to process eDNA data from raw sequences (single‐end or paired‐end) to generation of curated and noncurated zero‐radius operational taxonomic units (ZOTUs) and their abundance tables. This pipeline is based on Nextflow and Singularity which enable a scalable, portable and reproducible workflow using software containers on a local computer, clouds and high‐performance computing (HPC) clusters. Finally, we present an in‐house Python script to assign taxonomy to ZOTUs based on user specified thresholds for assigning lowest common ancestor (LCA). We demonstrate the utility and efficiency of the pipeline using an example of a published coral diversity biomonitoring study. Our results were congruent with the aforementioned study. The scalability of the pipeline is also demonstrated through analysis of a large data set containing 154 samples. To our knowledge, this is the first automated bioinformatic pipeline for eDNA analysis using two powerful tools: Nextflow and Singularity. This pipeline addresses two major challenges in the analysis of eDNA data; scalability and reproducibility. … (more)
- Is Part Of:
- Molecular ecology resources. Volume 21:Number 5(2021)
- Journal:
- Molecular ecology resources
- Issue:
- Volume 21:Number 5(2021)
- Issue Display:
- Volume 21, Issue 5 (2021)
- Year:
- 2021
- Volume:
- 21
- Issue:
- 5
- Issue Sort Value:
- 2021-0021-0005-0000
- Page Start:
- 1697
- Page End:
- 1704
- Publication Date:
- 2021-03-09
- Subjects:
- environmental DNA -- metabarcoding -- Nextflow -- Singularity
Molecular ecology -- Periodicals
572.8 - Journal URLs:
- http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)1755-0998 ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1111/1755-0998.13356 ↗
- Languages:
- English
- ISSNs:
- 1755-098X
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 5900.817368
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 17556.xml