Distance‐based clustering of mixed data. (5th November 2018)
- Record Type:
- Journal Article
- Title:
- Distance‐based clustering of mixed data. (5th November 2018)
- Main Title:
- Distance‐based clustering of mixed data
- Authors:
- van de Velden, Michel
Iodice D'Enza, Alfonso
Markos, Angelos - Abstract:
- Abstract : Cluster analysis comprises of several unsupervised techniques aiming to identify a subgroup (cluster) structure underlying the observations of a data set. The desired cluster allocation is such that it assigns similar observations to the same subgroup. Depending on the field of application and on domain‐specific requirements, different approaches exist that tackle the clustering problem. In distance‐based clustering, a distance metric is used to determine the similarity between data objects. The distance metric can be used to cluster observations by considering the distances between objects directly or by considering distances between objects and cluster centroids (or some other cluster representative points). Most distance metrics, and hence the distance‐based clustering methods, work either with continuous‐only or categorical‐only data. In applications, however, observations are often described by a combination of both continuous and categorical variables. Such data sets can be referred to as mixed or mixed‐type data. In this review, we consider different methods for distance‐based cluster analysis of mixed data. In particular, we distinguish three different streams that range from basic data preprocessing (where all variables are converted to the same scale), to the use of specific distance measures for mixed data, and finally to so‐called joint data reduction (a combination of dimension reduction and clustering) methods specifically designed for mixed data.Abstract : Cluster analysis comprises of several unsupervised techniques aiming to identify a subgroup (cluster) structure underlying the observations of a data set. The desired cluster allocation is such that it assigns similar observations to the same subgroup. Depending on the field of application and on domain‐specific requirements, different approaches exist that tackle the clustering problem. In distance‐based clustering, a distance metric is used to determine the similarity between data objects. The distance metric can be used to cluster observations by considering the distances between objects directly or by considering distances between objects and cluster centroids (or some other cluster representative points). Most distance metrics, and hence the distance‐based clustering methods, work either with continuous‐only or categorical‐only data. In applications, however, observations are often described by a combination of both continuous and categorical variables. Such data sets can be referred to as mixed or mixed‐type data. In this review, we consider different methods for distance‐based cluster analysis of mixed data. In particular, we distinguish three different streams that range from basic data preprocessing (where all variables are converted to the same scale), to the use of specific distance measures for mixed data, and finally to so‐called joint data reduction (a combination of dimension reduction and clustering) methods specifically designed for mixed data. This article is categorized under: Statistical Learning and Exploratory Methods of the Data Sciences > Clustering and Classification Statistical Learning and Exploratory Methods of the Data Sciences > Exploratory Data Analysis Statistical and Graphical Methods of Data Analysis > Dimension Reduction Abstract : An example of mixed data clustering. … (more)
- Is Part Of:
- Wiley interdisciplinary reviews. Volume 11:Number 3(2019)
- Journal:
- Wiley interdisciplinary reviews
- Issue:
- Volume 11:Number 3(2019)
- Issue Display:
- Volume 11, Issue 3 (2019)
- Year:
- 2019
- Volume:
- 11
- Issue:
- 3
- Issue Sort Value:
- 2019-0011-0003-0000
- Page Start:
- n/a
- Page End:
- n/a
- Publication Date:
- 2018-11-05
- Subjects:
- cluster analysis -- dimension reduction -- distance based methods -- joint dimension reduction and clustering, mixed data
Mathematical statistics -- Data processing -- Periodicals
Science -- Data processing -- Periodicals
Social sciences -- Data processing -- Periodicals
Mathematical statistics -- Periodicals
519.50285 - Journal URLs:
- http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1939-0068 ↗
http://www3.interscience.wiley.com/journal/122458798/home ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1002/wics.1456 ↗
- Languages:
- English
- ISSNs:
- 1939-5108
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 23747.xml