A highly scalable parallel algorithm for maximally informative k-itemset mining. Issue 1 (January 2017)
- Record Type:
- Journal Article
- Title:
- A highly scalable parallel algorithm for maximally informative k-itemset mining. Issue 1 (January 2017)
- Main Title:
- A highly scalable parallel algorithm for maximally informative k-itemset mining
- Authors:
- Salah, Saber
Akbarinia, Reza
Masseglia, Florent - Abstract:
- Abstract The discovery of informative itemsets is a fundamental building block in data analytics and information retrieval. While the problem has been widely studied, only few solutions scale. This is particularly the case when (1) the data set is massive, calling for large-scale distribution, and/or (2) the lengthk of the informative itemset to be discovered is high. In this paper, we address the problem of parallel mining of maximally informativek -itemsets (miki ) based on joint entropy. We propose PHIKS (P arallelH ighlyI nformative $$\underline{K}$$ K ̲ -ItemS et), a highly scalable, parallelmiki mining algorithm. PHIKS renders the mining process of large-scale databases (up to terabytes of data) succinct and effective. Its mining process is made up of only two efficient parallel jobs. With PHIKS, we provide a set of significant optimizations for calculating the joint entropies ofmiki having different sizes, which drastically reduces the execution time, the communication cost and the energy consumption, in a distributed computational platform. PHIKS has been extensively evaluated using massive real-world data sets. Our experimental results confirm the effectiveness of our proposal by the significant scale-up obtained with high itemsets length and over very large databases.
- Is Part Of:
- Knowledge and information systems. Volume 50:Issue 1(2017:Jan.)
- Journal:
- Knowledge and information systems
- Issue:
- Volume 50:Issue 1(2017:Jan.)
- Issue Display:
- Volume 50, Issue 1 (2017)
- Year:
- 2017
- Volume:
- 50
- Issue:
- 1
- Issue Sort Value:
- 2017-0050-0001-0000
- Page Start:
- 1
- Page End:
- 26
- Publication Date:
- 2017-01
- Subjects:
- Joint entropy -- Informative itemsets -- Massive distribution -- MapReduce -- Spark -- Hadoop -- Big data
Expert systems (Computer science) -- Periodicals
Information storage and retrieval systems -- Periodicals
006.33 - Journal URLs:
- http://link.springer-ny.com/link/service/journals/10115/index.htm ↗
http://www.springerlink.com/content/0219-1377 ↗
http://www.springer.com/gb/ ↗ - DOI:
- 10.1007/s10115-016-0931-2 ↗
- Languages:
- English
- ISSNs:
- 0219-1377
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 5100.437300
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 9980.xml