GeoDenStream: An improved DenStream clustering method for managing entity data within geographical data streams. (November 2020)
- Record Type:
- Journal Article
- Title:
- GeoDenStream: An improved DenStream clustering method for managing entity data within geographical data streams. (November 2020)
- Main Title:
- GeoDenStream: An improved DenStream clustering method for managing entity data within geographical data streams
- Authors:
- Li, Manqi
Croitoru, Arie
Yue, Songshan - Abstract:
- Abstract: Data streams have become an integral part of the rapidly evolving modern information landscape in various application domains. Stream clustering, and in particular density-based clustering, has emerged as one of the most commonly used data stream analysis tasks. Several density-based stream clustering methods have been proposed; chief among them is DenStream. Existing DenStream clustering methods usually preserve only the key summary descriptors about each cluster such as the center and radius. Such approach is not suitable for streams that observe discrete entities, since the clustering process does not maintain the entity-level composition of each cluster over time. The primary challenge we explore in this paper is therefore how existing DenStream clustering methods can be enhanced to support entity-based stream mining in geographical space. In view of this consideration, this paper presents GeoDenStream, a spatiotemporal entity-based stream clustering method. Building on DenStream, GeoDenStream is particularly suitable for clustering discrete entities due to its ability to track the relationship between entities and clusters over time and its ability to recover data that has been incorrectly labeled as noise. Memory efficiency in GeoDenStream is achieved by using a combination of data pruning and indexing. The performance of GeoDenStream was evaluated with both synthetic and real-world stream data from a popular social media platform (Twitter). The results ofAbstract: Data streams have become an integral part of the rapidly evolving modern information landscape in various application domains. Stream clustering, and in particular density-based clustering, has emerged as one of the most commonly used data stream analysis tasks. Several density-based stream clustering methods have been proposed; chief among them is DenStream. Existing DenStream clustering methods usually preserve only the key summary descriptors about each cluster such as the center and radius. Such approach is not suitable for streams that observe discrete entities, since the clustering process does not maintain the entity-level composition of each cluster over time. The primary challenge we explore in this paper is therefore how existing DenStream clustering methods can be enhanced to support entity-based stream mining in geographical space. In view of this consideration, this paper presents GeoDenStream, a spatiotemporal entity-based stream clustering method. Building on DenStream, GeoDenStream is particularly suitable for clustering discrete entities due to its ability to track the relationship between entities and clusters over time and its ability to recover data that has been incorrectly labeled as noise. Memory efficiency in GeoDenStream is achieved by using a combination of data pruning and indexing. The performance of GeoDenStream was evaluated with both synthetic and real-world stream data from a popular social media platform (Twitter). The results of these evaluations show that GeoDenStream is able to efficiently handle memory constraints, overlapping data points, and false noise. Highlights: A clustering method for entity-based data streams with geographical information. Information on entity-cluster relationships is preserved over space and time. Memory use and handling of overlapping points and false noise are enhanced. Clustering synthetic and real stream data demonstrate improvement in performance. … (more)
- Is Part Of:
- Computers & geosciences. Volume 144(2020)
- Journal:
- Computers & geosciences
- Issue:
- Volume 144(2020)
- Issue Display:
- Volume 144, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 144
- Issue:
- 2020
- Issue Sort Value:
- 2020-0144-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-11
- Subjects:
- GeoDenStream -- DenStream -- Clustering -- Geographical data stream -- Spatiotemporal analysis
Environmental policy -- Periodicals
550.5 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00983004 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.cageo.2020.104563 ↗
- Languages:
- English
- ISSNs:
- 0098-3004
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.695000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 14612.xml