Reproducible RNA‐seq data processing and analysis tools in the cloud. (December 2021)
- Record Type:
- Journal Article
- Title:
- Reproducible RNA‐seq data processing and analysis tools in the cloud. (December 2021)
- Main Title:
- Reproducible RNA‐seq data processing and analysis tools in the cloud
- Authors:
- Poehlman, William L.
Montgomery, Kelsey S.
Gockley, Jake
Thyer, Tess
Kauer, Nicole
Eddy, James
Peters, Mette A
Sieberts, Solveig K
Woo, Kara H
Greenwood, Anna K
Omberg, Larsson
Mangravite, Lara M - Abstract:
- Abstract: Background: Discovering heterogeneous biological processes underlying Alzheimer's Disease (AD) is a key to prioritizing potential drug candidates. Analysis of RNA sequencing (RNA‐seq) data can improve our understanding of these processes by revealing gene expression patterns associated with AD. Analyzing this data requires processing large volumes of sequencing files, as well as downstream analysis in a secure compute environment. To ensure reliable results, it is important to execute software consistently so that results can be reproduced in different environments. To help address these challenges, we have developed reproducible bioinformatic tools for raw data processing and analysis in the Amazon Web Services (AWS) cloud compute environment. Method: We have implemented a RNA‐seq processing pipeline in common workflow language (CWL). Raw sequencing reads in the form of Fastq or Bam files are aligned to the reference genome using the STAR read aligner and gene counts are quantified (https://github.com/Sage‐Bionetworks‐Workflows/dockstore‐workflow‐rnaseq). In addition, we have developed an R package for gene count normalization (https://github.com/Sage‐Bionetworks/sageseqr). To enable execution of these tools, we provide an analytical workspace in a secure AWS compute environment (https://adknowledgeportal.synapse.org/Analytical%20Workspace). Result: We have utilized these tools to reprocess data from several RNA‐seq studies that are available through the ADAbstract: Background: Discovering heterogeneous biological processes underlying Alzheimer's Disease (AD) is a key to prioritizing potential drug candidates. Analysis of RNA sequencing (RNA‐seq) data can improve our understanding of these processes by revealing gene expression patterns associated with AD. Analyzing this data requires processing large volumes of sequencing files, as well as downstream analysis in a secure compute environment. To ensure reliable results, it is important to execute software consistently so that results can be reproduced in different environments. To help address these challenges, we have developed reproducible bioinformatic tools for raw data processing and analysis in the Amazon Web Services (AWS) cloud compute environment. Method: We have implemented a RNA‐seq processing pipeline in common workflow language (CWL). Raw sequencing reads in the form of Fastq or Bam files are aligned to the reference genome using the STAR read aligner and gene counts are quantified (https://github.com/Sage‐Bionetworks‐Workflows/dockstore‐workflow‐rnaseq). In addition, we have developed an R package for gene count normalization (https://github.com/Sage‐Bionetworks/sageseqr). To enable execution of these tools, we provide an analytical workspace in a secure AWS compute environment (https://adknowledgeportal.synapse.org/Analytical%20Workspace). Result: We have utilized these tools to reprocess data from several RNA‐seq studies that are available through the AD Knowledge Portal (adknowledgeportal.org) as the RNAseq Harmonization Study. As new datasets are generated, they can be processed with a consistent software environment to enable cross‐study analysis. By enabling reproducible data processing, users can perform similar RNA‐seq experiments without needing to implement new pipelines. Conclusion: The development of reproducible RNA‐seq processing and analysis tools provides a valuable resource for the AD research community. While we have demonstrated execution of these tools in the cloud, they may also be executed in diverse environments such as high performance compute clusters. Our tools will remain stable resources for reproducible processing of RNA‐seq datasets under evolving infrastructures. … (more)
- Is Part Of:
- Alzheimer's & dementia. Volume 17(2021)Supplement 3
- Journal:
- Alzheimer's & dementia
- Issue:
- Volume 17(2021)Supplement 3
- Issue Display:
- Volume 17, Issue 3 (2021)
- Year:
- 2021
- Volume:
- 17
- Issue:
- 3
- Issue Sort Value:
- 2021-0017-0003-0000
- Page Start:
- n/a
- Page End:
- n/a
- Publication Date:
- 2021-12
- Subjects:
- Alzheimer's disease -- Periodicals
Alzheimer Disease -- Periodicals
Dementia -- Periodicals
Démence
Maladie d'Alzheimer
Périodique électronique (Descripteur de forme)
Ressource Internet (Descripteur de forme)
616.83 - Journal URLs:
- http://www.sciencedirect.com/science/journal/15525260 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1002/alz.056527 ↗
- Languages:
- English
- ISSNs:
- 1552-5260
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 0806.255333
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 20531.xml