Multi-label classification via incremental clustering on an evolving data stream. (November 2019)
- Record Type:
- Journal Article
- Title:
- Multi-label classification via incremental clustering on an evolving data stream. (November 2019)
- Main Title:
- Multi-label classification via incremental clustering on an evolving data stream
- Authors:
- Nguyen, Tien Thanh
Dang, Manh Truong
Luong, Anh Vu
Liew, Alan Wee-Chung
Liang, Tiancai
McCall, John - Abstract:
- Highlights: An incremental clustering-based multi-label online classification algorithm for multi-label data stream is proposed. To handle concept drift, our algorithm evolves with time, giving higher attention to more recent samples than older samples through a weight decay mechanism. Our algorithm dynamically determines the number of predicted labels based on Hoeffding inequality and the label cardinality. Extensive comparative experiments with the state-of-the-art algorithms validated the superior performance of our algorithm in both the stationary and concept drift settings. Abstract: With the advancement of storage and processing technology, an enormous amount of data is collected on a daily basis in many applications. Nowadays, advanced data analytics have been used to mine the collected data for useful information and make predictions, contributing to the competitive advantages of companies. The increasing data volume, however, has posed many problems to classical batch learning systems, such as the need to retrain the model completely with the newly arrived samples or the impracticality of storing and accessing a large volume of data. This has prompted interest on incremental learning that operates on data streams. In this study, we develop an incremental online multi-label classification (OMLC) method based on a weighted clustering model. The model is made to adapt to the change of data via the decay mechanism in which each sample's weight dwindles away over time.Highlights: An incremental clustering-based multi-label online classification algorithm for multi-label data stream is proposed. To handle concept drift, our algorithm evolves with time, giving higher attention to more recent samples than older samples through a weight decay mechanism. Our algorithm dynamically determines the number of predicted labels based on Hoeffding inequality and the label cardinality. Extensive comparative experiments with the state-of-the-art algorithms validated the superior performance of our algorithm in both the stationary and concept drift settings. Abstract: With the advancement of storage and processing technology, an enormous amount of data is collected on a daily basis in many applications. Nowadays, advanced data analytics have been used to mine the collected data for useful information and make predictions, contributing to the competitive advantages of companies. The increasing data volume, however, has posed many problems to classical batch learning systems, such as the need to retrain the model completely with the newly arrived samples or the impracticality of storing and accessing a large volume of data. This has prompted interest on incremental learning that operates on data streams. In this study, we develop an incremental online multi-label classification (OMLC) method based on a weighted clustering model. The model is made to adapt to the change of data via the decay mechanism in which each sample's weight dwindles away over time. The clustering model therefore always focuses more on newly arrived samples. In the classification process, only clusters whose weights are greater than a threshold (called mature clusters) are employed to assign labels for the samples. In our method, not only is the clustering model incrementally maintained with the revealed ground truth labels of the arrived samples, the number of predicted labels in a sample are also adjusted based on the Hoeffding inequality and the label cardinality. The experimental results show that our method is competitive compared to several well-known benchmark algorithms on six performance measures in both the stationary and the concept drift settings. … (more)
- Is Part Of:
- Pattern recognition. Volume 95(2019:Nov.)
- Journal:
- Pattern recognition
- Issue:
- Volume 95(2019:Nov.)
- Issue Display:
- Volume 95 (2019)
- Year:
- 2019
- Volume:
- 95
- Issue Sort Value:
- 2019-0095-0000-0000
- Page Start:
- 96
- Page End:
- 113
- Publication Date:
- 2019-11
- Subjects:
- Multi-label classification -- Incremental learning -- Online learning -- Clustering -- Data stream -- Concept drift
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2019.06.001 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 11157.xml