SEaCorAl: Identifying and contrasting the regulation-correlation bias in RNA-Seq paired expression data of patient groups. (August 2021)
- Record Type:
- Journal Article
- Title:
- SEaCorAl: Identifying and contrasting the regulation-correlation bias in RNA-Seq paired expression data of patient groups. (August 2021)
- Main Title:
- SEaCorAl: Identifying and contrasting the regulation-correlation bias in RNA-Seq paired expression data of patient groups
- Authors:
- Petti, Manuela
Verrienti, Antonella
Paci, Paola
Farina, Lorenzo - Abstract:
- Abstract: The Cancer Genome Atlas database offers the possibility of analyzing genome-wide expression RNA-Seq cancer data using paired counts, that is, studies where expression data are collected in pairs of normal and cancer cells, by taking samples from the same individual. Correlation of gene expression profiles is the most common analysis to study co-expression groups, which is used to find biological interpretation of -omics big data. The aim of the paper is threefold: firstly we show for the first time, the presence of a "regulation-correlation bias" in RNA-Seq paired expression data, that is an artifactual link between the expression status (up- or down-regulation) of a gene pair and the sign of the corresponding correlation coefficient. Secondly, we provide a statistical model able to theoretically explain the reasons for the presence of such a bias. Thirdly, we present a bias-removal algorithm, called SEaCorAl, able to effectively reduce bias effects and improve the biological significance of correlation analysis. Validation of the SEaCorAl algorithm is performed by showing a significant increase in the ability to detect biologically meaningful associations of positive correlations and a significant increase of the modularity of the resulting unbiased correlation network. Graphical abstract: Image 1 Highlights: RNA-Seq paired expression data are affected by an artifactual link between genes regulation status and their correlation sign. A statistical model able toAbstract: The Cancer Genome Atlas database offers the possibility of analyzing genome-wide expression RNA-Seq cancer data using paired counts, that is, studies where expression data are collected in pairs of normal and cancer cells, by taking samples from the same individual. Correlation of gene expression profiles is the most common analysis to study co-expression groups, which is used to find biological interpretation of -omics big data. The aim of the paper is threefold: firstly we show for the first time, the presence of a "regulation-correlation bias" in RNA-Seq paired expression data, that is an artifactual link between the expression status (up- or down-regulation) of a gene pair and the sign of the corresponding correlation coefficient. Secondly, we provide a statistical model able to theoretically explain the reasons for the presence of such a bias. Thirdly, we present a bias-removal algorithm, called SEaCorAl, able to effectively reduce bias effects and improve the biological significance of correlation analysis. Validation of the SEaCorAl algorithm is performed by showing a significant increase in the ability to detect biologically meaningful associations of positive correlations and a significant increase of the modularity of the resulting unbiased correlation network. Graphical abstract: Image 1 Highlights: RNA-Seq paired expression data are affected by an artifactual link between genes regulation status and their correlation sign. A statistical model able to theoretically explain the reason for the presence of such regulation-correlation bias is proposed. SEaCorAl algorithm, able to reduce bias effects and improve the biological significance of correlation analysis, is presented. … (more)
- Is Part Of:
- Computers in biology and medicine. Volume 135(2021)
- Journal:
- Computers in biology and medicine
- Issue:
- Volume 135(2021)
- Issue Display:
- Volume 135, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 135
- Issue:
- 2021
- Issue Sort Value:
- 2021-0135-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-08
- Subjects:
- Correlation networks -- Correlation analysis -- Spurious correlations -- RNA-Seq data -- Paired data
Medicine -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
610.285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00104825/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiomed.2021.104567 ↗
- Languages:
- English
- ISSNs:
- 0010-4825
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.880000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 18878.xml