Principal component analysis for bar charts and metabins tables. (20th May 2013)
- Record Type:
- Journal Article
- Title:
- Principal component analysis for bar charts and metabins tables. (20th May 2013)
- Main Title:
- Principal component analysis for bar charts and metabins tables
- Authors:
- Diday, Edwin
- Abstract:
- <abstract abstract-type="main" xml:lang="en"> <title>Abstract</title> <p>In recent years, the analysis of symbolic data where the units are categories, classes, or concepts described by intervals, distributions, sets of categories, and the like becomes a challenging task since many application fields generate complex and massive amounts of data that are difficult to analyze with traditional techniques. In this article, we propose a strategy for extending standard principal component analysis (PCA) to such data in the case where the variables values are 'bar charts' (i.e., a set of categories called bins with their relative frequencies). First, we introduce 'metabins' which mix together bins of the different bar charts and enhance interpretability. Standard PCA applied on the bins of such data tables can lose the bar chart constraints and suppose independencies between the bins. Therefore, we introduce a 'Copular PCA' as copulas take care of the probabilities and the underlying dependencies. Some theoretical results lead to the representation of the bar chart variables inside a hypercube covering the correlation sphere of a PCA applied on the bins. We give several ways for representing individuals and pathways of individuals × metabins or individuals × variables. Several tools of interpretation of such representations based on 'coherency' of metabins (or variables) among a trajectory (i.e., oriented pathway) of individuals and 'diversity' of individuals among a trajectory of<abstract abstract-type="main" xml:lang="en"> <title>Abstract</title> <p>In recent years, the analysis of symbolic data where the units are categories, classes, or concepts described by intervals, distributions, sets of categories, and the like becomes a challenging task since many application fields generate complex and massive amounts of data that are difficult to analyze with traditional techniques. In this article, we propose a strategy for extending standard principal component analysis (PCA) to such data in the case where the variables values are 'bar charts' (i.e., a set of categories called bins with their relative frequencies). First, we introduce 'metabins' which mix together bins of the different bar charts and enhance interpretability. Standard PCA applied on the bins of such data tables can lose the bar chart constraints and suppose independencies between the bins. Therefore, we introduce a 'Copular PCA' as copulas take care of the probabilities and the underlying dependencies. Some theoretical results lead to the representation of the bar chart variables inside a hypercube covering the correlation sphere of a PCA applied on the bins. We give several ways for representing individuals and pathways of individuals × metabins or individuals × variables. Several tools of interpretation of such representations based on 'coherency' of metabins (or variables) among a trajectory (i.e., oriented pathway) of individuals and 'diversity' of individuals among a trajectory of metabins (or variables) are illustrated by some simple examples. © 2013 Wiley Periodicals, Inc. Statistical Analysis and Data Mining, 2013</p> </abstract> … (more)
- Is Part Of:
- Statistical analysis and data mining. Volume 6:Number 5(2013)
- Journal:
- Statistical analysis and data mining
- Issue:
- Volume 6:Number 5(2013)
- Issue Display:
- Volume 6, Issue 5 (2013)
- Year:
- 2013
- Volume:
- 6
- Issue:
- 5
- Issue Sort Value:
- 2013-0006-0005-0000
- Page Start:
- 403
- Page End:
- 430
- Publication Date:
- 2013-05-20
- Subjects:
- Data mining -- Statistical methods -- Periodicals
006.312 - Journal URLs:
- http://www3.interscience.wiley.com/journal/112701062/home ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1002/sam.11188 ↗
- Languages:
- English
- ISSNs:
- 1932-1864
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 8447.424100
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 3971.xml