A K-partitioning algorithm for clustering large-scale spatio-textual data. (March 2017)
- Record Type:
- Journal Article
- Title:
- A K-partitioning algorithm for clustering large-scale spatio-textual data. (March 2017)
- Main Title:
- A K-partitioning algorithm for clustering large-scale spatio-textual data
- Authors:
- Choi, Dong-Wan
Chung, Chin-Wan - Abstract:
- Abstract: The volume of spatio-textual data is drastically increasing in these days, and this makes more and more essential to process such a large-scale spatio-textual dataset. Even though numerous works have been studied for answering various kinds of spatio-textual queries, the analyzing method for spatio-textual data has rarely been considered so far. Motivated by this, this paper proposes a k -means based clustering algorithm specialized for a massive spatio-textual data. One of the strong points of the k -means algorithm lies in its efficiency and scalability, implying that it is appropriate for a large-scale data. However, it is challenging to apply the normal k -means algorithm to spatio-textual data, since each spatio-textual object has non-numeric attributes, that is, textual dimension, as well as numeric attributes, that is, spatial dimension. We address this problem by using the expected distance between a random pair of objects rather than constructing actual centroid of each cluster. Based on our experimental results, we show that the clustering quality of our algorithm is comparable to those of other k - partitioning algorithms that can process spatio-textual data, and its efficiency is superior to those competitors. Abstract : Highlights: The problem of clustering large-scale spatio-textual data is firstly studied. It has many real applications like location-based data cleaning. A modified version of the k-means clustering algorithm is developed forAbstract: The volume of spatio-textual data is drastically increasing in these days, and this makes more and more essential to process such a large-scale spatio-textual dataset. Even though numerous works have been studied for answering various kinds of spatio-textual queries, the analyzing method for spatio-textual data has rarely been considered so far. Motivated by this, this paper proposes a k -means based clustering algorithm specialized for a massive spatio-textual data. One of the strong points of the k -means algorithm lies in its efficiency and scalability, implying that it is appropriate for a large-scale data. However, it is challenging to apply the normal k -means algorithm to spatio-textual data, since each spatio-textual object has non-numeric attributes, that is, textual dimension, as well as numeric attributes, that is, spatial dimension. We address this problem by using the expected distance between a random pair of objects rather than constructing actual centroid of each cluster. Based on our experimental results, we show that the clustering quality of our algorithm is comparable to those of other k - partitioning algorithms that can process spatio-textual data, and its efficiency is superior to those competitors. Abstract : Highlights: The problem of clustering large-scale spatio-textual data is firstly studied. It has many real applications like location-based data cleaning. A modified version of the k-means clustering algorithm is developed for spatio-textual data using the expected pairwise distance. Experimentally, our algorithm is not only fast enough to tackle a massive spatio-textual dataset, but also fairly effective in terms of the quality. … (more)
- Is Part Of:
- Information systems. Volume 64(2017)
- Journal:
- Information systems
- Issue:
- Volume 64(2017)
- Issue Display:
- Volume 64, Issue 2017 (2017)
- Year:
- 2017
- Volume:
- 64
- Issue:
- 2017
- Issue Sort Value:
- 2017-0064-2017-0000
- Page Start:
- 1
- Page End:
- 11
- Publication Date:
- 2017-03
- Subjects:
- Spatio-textual similarity -- K-means clustering -- K-medoids clustering -- K-prototypes clustering -- Expected distance -- Grid partitioning
Database management -- Periodicals
Electronic data processing -- Periodicals
Bases de données -- Gestion -- Périodiques
Informatique -- Périodiques
Database management
Electronic data processing
Periodicals
005.7 - Journal URLs:
- http://www.sciencedirect.com/science/journal/03064379 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.is.2016.08.003 ↗
- Languages:
- English
- ISSNs:
- 0306-4379
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4496.367300
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 1232.xml