Joint analysis of multiple high-dimensional data types using sparse matrix approximations of rank-1 with applications to ovarian and liver cancer. Issue 1 (December 2016)
- Record Type:
- Journal Article
- Title:
- Joint analysis of multiple high-dimensional data types using sparse matrix approximations of rank-1 with applications to ovarian and liver cancer. Issue 1 (December 2016)
- Main Title:
- Joint analysis of multiple high-dimensional data types using sparse matrix approximations of rank-1 with applications to ovarian and liver cancer
- Authors:
- Okimoto, Gordon
Zeinalzadeh, Ashkan
Wenska, Tom
Loomis, Michael
Nation, James
Fabre, Tiphaine
Tiirikainen, Maarit
Hernandez, Brenda
Chan, Owen
Wong, Linda
Kwee, Sandi - Abstract:
- Abstract Background Technological advances enable the cost-effective acquisition ofMulti-Modal Data Sets (MMDS) composed of measurements for multiple, high-dimensional data types obtained from a common set of bio-samples. The joint analysis of the data matrices associated with the different data types of a MMDS should provide a more focused view of the biology underlying complex diseases such as cancer that would not be apparent from the analysis of a single data type alone. As multi-modal data rapidly accumulate in research laboratories and public databases such asThe Cancer Genome Atlas (TCGA), the translation of such data into clinically actionable knowledge has been slowed by the lack of computational tools capable of analyzing MMDSs. Here, we describe theJoint Analysis of Many Matrices by ITeration (JAMMIT) algorithm that jointly analyzes the data matrices of a MMDS using sparse matrix approximations of rank-1. Methods The JAMMIT algorithm jointly approximates an arbitrary number of data matrices by rank-1 outer-products composed of "sparse" left-singular vectors (eigen-arrays) that are unique to each matrix and a right-singular vector (eigen-signal) that is common to all the matrices. The non-zero coefficients of the eigen-arrays identify small subsets of variables for each data type (i.e., signatures) that in aggregate, or individually, best explain a dominant eigen-signal defined on the columns of the data matrices. The approximation is specified by a singleAbstract Background Technological advances enable the cost-effective acquisition ofMulti-Modal Data Sets (MMDS) composed of measurements for multiple, high-dimensional data types obtained from a common set of bio-samples. The joint analysis of the data matrices associated with the different data types of a MMDS should provide a more focused view of the biology underlying complex diseases such as cancer that would not be apparent from the analysis of a single data type alone. As multi-modal data rapidly accumulate in research laboratories and public databases such asThe Cancer Genome Atlas (TCGA), the translation of such data into clinically actionable knowledge has been slowed by the lack of computational tools capable of analyzing MMDSs. Here, we describe theJoint Analysis of Many Matrices by ITeration (JAMMIT) algorithm that jointly analyzes the data matrices of a MMDS using sparse matrix approximations of rank-1. Methods The JAMMIT algorithm jointly approximates an arbitrary number of data matrices by rank-1 outer-products composed of "sparse" left-singular vectors (eigen-arrays) that are unique to each matrix and a right-singular vector (eigen-signal) that is common to all the matrices. The non-zero coefficients of the eigen-arrays identify small subsets of variables for each data type (i.e., signatures) that in aggregate, or individually, best explain a dominant eigen-signal defined on the columns of the data matrices. The approximation is specified by a single "sparsity" parameter that is selected based on false discovery rate estimated by permutation testing. Multiple signals of interest in a given MDDS are sequentially detected and modeled by iterating JAMMIT on "residual" data matrices that result from a given sparse approximation. Results We show that JAMMIT outperforms other joint analysis algorithms in the detection of multiple signatures embedded in simulated MDDS. On real multimodal data for ovarian and liver cancer we show that JAMMIT identified multi-modal signatures that were clinically informative and enriched for cancer-related biology. Conclusions Sparse matrix approximations of rank-1 provide a simple yet effective means of jointly reducing multiple, big data types to a small subset of variables that characterize important clinical and/or biological attributes of the bio-samples from which the data were acquired. … (more)
- Is Part Of:
- Biodata mining. Volume 9:Issue 1(2016)
- Journal:
- Biodata mining
- Issue:
- Volume 9:Issue 1(2016)
- Issue Display:
- Volume 9, Issue 1 (2016)
- Year:
- 2016
- Volume:
- 9
- Issue:
- 1
- Issue Sort Value:
- 2016-0009-0001-0000
- Page Start:
- 1
- Page End:
- 28
- Publication Date:
- 2016-12
- Subjects:
- Generalized singular value decomposition -- Joint data analysis -- Ovarian cancer -- Hepatocellular carcinoma -- The Cancer Genome Atlas -- LASSO -- Sparse signal detection
Bioinformatics -- Periodicals
Computational biology -- Periodicals
Data mining -- Periodicals
570.285 - Journal URLs:
- http://www.biodatamining.org/ ↗
http://link.springer.com/ ↗ - DOI:
- 10.1186/s13040-016-0103-7 ↗
- Languages:
- English
- ISSNs:
- 1756-0381
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 9879.xml