Efficient top-k recently-frequent term querying over spatio-temporal textual streams. Issue 97 (March 2021)
- Record Type:
- Journal Article
- Title:
- Efficient top-k recently-frequent term querying over spatio-temporal textual streams. Issue 97 (March 2021)
- Main Title:
- Efficient top-k recently-frequent term querying over spatio-temporal textual streams
- Authors:
- Dam, Thu-Lan
Chester, Sean
Nørvåg, Kjetil
Duong, Quang-Huy - Abstract:
- Abstract: Massive amounts of data with spatio-temporal-textual information are being generated due to the proliferation of GPS-equipped mobile devices. Much of this data are social media posts, often used to share and spread personal updates and news. Exploring valuable information from a dynamic collection of social posts is of great interest and has attracted many studies. However, because the size of data is huge, the existing methods mostly work with the time window model where the old data is discarded. In this work, we introduce the task of efficiently discovering the top- k most popular terms within a user specified bounded region over a stream of social posts, where the recent posts are more important than the old ones. To make this feasible, we propose a hybrid index structure and algorithms to efficiently answer such top- k queries. Our index employs a spatial index augmented by top- k time-weighted term lists and a bulk updating technique to support fast digestion of social post streams. Further, these top- k term lists are employed in the aggregation step to produce the final results so that incoming queries can be efficiently processed. An extensive experimental study with a large collection of social posts shows that the proposed methods are capable of both online aggregation and accurate query processing. Highlights: Introduce a location-based time-decaying query to retrieve recently frequent terms within a user specified region of interest, and we proposeAbstract: Massive amounts of data with spatio-temporal-textual information are being generated due to the proliferation of GPS-equipped mobile devices. Much of this data are social media posts, often used to share and spread personal updates and news. Exploring valuable information from a dynamic collection of social posts is of great interest and has attracted many studies. However, because the size of data is huge, the existing methods mostly work with the time window model where the old data is discarded. In this work, we introduce the task of efficiently discovering the top- k most popular terms within a user specified bounded region over a stream of social posts, where the recent posts are more important than the old ones. To make this feasible, we propose a hybrid index structure and algorithms to efficiently answer such top- k queries. Our index employs a spatial index augmented by top- k time-weighted term lists and a bulk updating technique to support fast digestion of social post streams. Further, these top- k term lists are employed in the aggregation step to produce the final results so that incoming queries can be efficiently processed. An extensive experimental study with a large collection of social posts shows that the proposed methods are capable of both online aggregation and accurate query processing. Highlights: Introduce a location-based time-decaying query to retrieve recently frequent terms within a user specified region of interest, and we propose both exact and approximate algorithms to address it efficiently. Introduce the time-weighted term list structure to enable both quad-trees and R-trees to index social post streams. Demonstrate how to support fast digestion of social streams with a batch insertion of simultaneous Morton encoding and time-weighted frequency pre-calculation. Extensive experimental evaluation was performed to evaluate query response performance and index cost. The results show that our methods are highly efficient in terms of query response time, accuracy and scalability. … (more)
- Is Part Of:
- Information systems. Issue 97(2021)
- Journal:
- Information systems
- Issue:
- Issue 97(2021)
- Issue Display:
- Volume 97, Issue 97 (2021)
- Year:
- 2021
- Volume:
- 97
- Issue:
- 97
- Issue Sort Value:
- 2021-0097-0097-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-03
- Subjects:
- Frequent terms -- Time-weighted -- Spatio-temporal query -- Top-k query -- Spatial index -- Spatio-temporal textual stream
Database management -- Periodicals
Electronic data processing -- Periodicals
Bases de données -- Gestion -- Périodiques
Informatique -- Périodiques
Database management
Electronic data processing
Periodicals
005.7 - Journal URLs:
- http://www.sciencedirect.com/science/journal/03064379 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.is.2020.101687 ↗
- Languages:
- English
- ISSNs:
- 0306-4379
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4496.367300
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 15425.xml