A random version of principal component analysis in data clustering. (April 2018)
- Record Type:
- Journal Article
- Title:
- A random version of principal component analysis in data clustering. (April 2018)
- Main Title:
- A random version of principal component analysis in data clustering
- Authors:
- Palese, Luigi Leonardo
- Abstract:
- Graphical abstract: Highlights: Dimensionality reduction techniques are extremely important in science today. PCA is widely used, but it requires the knowledge of the covariance matrix. Algorithms based on random projection are increasingly used. An new random projection algorithm is proposed, very similar to PCA. It is easy to implement, conceptually simple and numerically robust. Abstract: Principal component analysis (PCA) is a widespread technique for data analysis that relies on the covariance/correlation matrix of the analyzed data. However, to properly work with high-dimensional data sets, PCA poses severe mathematical constraints on the minimum number of different replicates, or samples, that must be included in the analysis. Generally, improper sampling is due to a small number of data respect to the number of the degrees of freedom that characterize the ensemble. In the field of life sciences it is often important to have an algorithm that can accept poorly dimensioned data sets, including degenerated ones. Here a new random projection algorithm is proposed, in which a random symmetric matrix surrogates the covariance/correlation matrix of PCA, while maintaining the data clustering capacity. We demonstrate that what is important for clustering efficiency of PCA is not the exact form of the covariance/correlation matrix, but simply its symmetry.
- Is Part Of:
- Computational biology and chemistry. Volume 73(2018)
- Journal:
- Computational biology and chemistry
- Issue:
- Volume 73(2018)
- Issue Display:
- Volume 73, Issue 2018 (2018)
- Year:
- 2018
- Volume:
- 73
- Issue:
- 2018
- Issue Sort Value:
- 2018-0073-2018-0000
- Page Start:
- 57
- Page End:
- 64
- Publication Date:
- 2018-04
- Subjects:
- Principal component analysis -- Random projection -- Dimensionality reduction -- Data clustering -- Protein structure -- Structural bioinformatics
Chemistry -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
Biochemistry -- Data processing
Biology -- Data processing
Molecular biology -- Data processing
Periodicals
Electronic journals
542.85 - Journal URLs:
- http://www.sciencedirect.com/science/journal/14769271 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiolchem.2018.01.009 ↗
- Languages:
- English
- ISSNs:
- 1476-9271
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3390.576700
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 20965.xml