Efficient strategies for incremental mining of frequent closed itemsets over data streams. (1st April 2022)
- Record Type:
- Journal Article
- Title:
- Efficient strategies for incremental mining of frequent closed itemsets over data streams. (1st April 2022)
- Main Title:
- Efficient strategies for incremental mining of frequent closed itemsets over data streams
- Authors:
- Liu, Junqiang
Ye, Zhousheng
Yang, Xiangcai
Wang, Xueling
Shen, Linjie
Jiang, Xiaoning - Abstract:
- Abstract: Mining frequent closed itemsets over data streams is an important data mining problem. Mining data streams is more challenging than mining static data because of the nature of data streams, including high arrival rate, massive volume of incoming data, and concept drift. The existing algorithms for mining frequent closed itemsets over data streams suffer from scalability and efficiency bottlenecks. This paper proposes a novel algorithm for mining frequent closed itemsets over data streams both for the sliding window model and for the landmark model. An indexed prefix closed itemset tree is proposed for compressing all closed itemsets and for quick searching of closed itemsets, and novel search strategies are proposed to prune the search space in updating the set of closed itemsets. The proposed algorithm outperforms the state-of-the-art intersection-based algorithms, CICLAD, ConPatSet, and CloStream, by several times to 2 orders of magnitude in efficiency, and also outperforms the state-of-the-art pattern enumeration algorithm, Moment, by up to 2 orders of magnitude over data streams with large windows and sparse data streams. The proposed algorithm is also superior in scalability. Highlights: Mining closed itemsets over data streams for sliding window and landmark models. Intersection-based approach by novel data structure and pruning strategies. Insightful analysis and theoretical proof for handling the transaction deletion. Efficiency improvement by up to 2Abstract: Mining frequent closed itemsets over data streams is an important data mining problem. Mining data streams is more challenging than mining static data because of the nature of data streams, including high arrival rate, massive volume of incoming data, and concept drift. The existing algorithms for mining frequent closed itemsets over data streams suffer from scalability and efficiency bottlenecks. This paper proposes a novel algorithm for mining frequent closed itemsets over data streams both for the sliding window model and for the landmark model. An indexed prefix closed itemset tree is proposed for compressing all closed itemsets and for quick searching of closed itemsets, and novel search strategies are proposed to prune the search space in updating the set of closed itemsets. The proposed algorithm outperforms the state-of-the-art intersection-based algorithms, CICLAD, ConPatSet, and CloStream, by several times to 2 orders of magnitude in efficiency, and also outperforms the state-of-the-art pattern enumeration algorithm, Moment, by up to 2 orders of magnitude over data streams with large windows and sparse data streams. The proposed algorithm is also superior in scalability. Highlights: Mining closed itemsets over data streams for sliding window and landmark models. Intersection-based approach by novel data structure and pruning strategies. Insightful analysis and theoretical proof for handling the transaction deletion. Efficiency improvement by up to 2 orders of magnitude. … (more)
- Is Part Of:
- Expert systems with applications. Volume 191(2022)
- Journal:
- Expert systems with applications
- Issue:
- Volume 191(2022)
- Issue Display:
- Volume 191, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 191
- Issue:
- 2022
- Issue Sort Value:
- 2022-0191-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-04-01
- Subjects:
- Data streams -- Closed itemsets -- Frequent itemsets -- Data mining -- Knowledge discovery
Expert systems (Computer science) -- Periodicals
Systèmes experts (Informatique) -- Périodiques
Electronic journals
006.33 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09574174 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.eswa.2021.116220 ↗
- Languages:
- English
- ISSNs:
- 0957-4174
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3842.004220
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 20351.xml