QSJoin: a new string similarity join method based on Q-sample and statistical features. (25th March 2019)
- Record Type:
- Journal Article
- Title:
- QSJoin: a new string similarity join method based on Q-sample and statistical features. (25th March 2019)
- Main Title:
- QSJoin: a new string similarity join method based on Q-sample and statistical features
- Authors:
- Wang, Xiaoxia
Sun, Decai
Wu, Bo
Ji, Puzhao - Abstract:
- Similarity joins is an essential operation in big data analytics, such as data integration and data cleaning. In this paper, we propose a new algorithm, called QSJoin, to support efficient string similarity join by reducing the shuffle cost and transmission cost in MapReduce. Our algorithm employs a filter-verify framework. In filtration, a new signature scheme based on q-sample is adopted to decrease the number of generated signatures, and then a large number of dissimilar pairs are discarded with Standard-Match filter. In verification, a multi-vector filter scheme is adopted to eliminate more dissimilar pairs with statistical features, and then the final true pairs is extracted by the verification of candidate pairs with length-aware verification method. Experimental result on real-world datasets shows that our algorithm achieves high performance and outperforms state-of-the-art approaches.
- Is Part Of:
- International journal of arts and technology. Volume 11:Number 3(2019)
- Journal:
- International journal of arts and technology
- Issue:
- Volume 11:Number 3(2019)
- Issue Display:
- Volume 11, Issue 3 (2019)
- Year:
- 2019
- Volume:
- 11
- Issue:
- 3
- Issue Sort Value:
- 2019-0011-0003-0000
- Page Start:
- 285
- Page End:
- 308
- Publication Date:
- 2019-03-25
- Subjects:
- string similarity join -- MapReduce -- Q-sample -- statistical feature -- data integration
Technology and the arts -- Periodicals
Computer art -- Periodicals
700.285 - Journal URLs:
- http://inderscience.metapress.com/content/121164 ↗
http://www.inderscience.com/ ↗ - Languages:
- English
- ISSNs:
- 1754-8853
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 12391.xml