An extensive study of C-SMOTE, a Continuous Synthetic Minority Oversampling Technique for Evolving Data Streams. (15th June 2022)
- Record Type:
- Journal Article
- Title:
- An extensive study of C-SMOTE, a Continuous Synthetic Minority Oversampling Technique for Evolving Data Streams. (15th June 2022)
- Main Title:
- An extensive study of C-SMOTE, a Continuous Synthetic Minority Oversampling Technique for Evolving Data Streams
- Authors:
- Bernardo, Alessio
Della Valle, Emanuele - Abstract:
- Abstract: Streaming Machine Learning (SML) studies algorithms that update their models, given an unbounded and often non-stationary flow of data performing a single pass. Online class imbalance learning is a branch of SML that combines the challenges of both class imbalance and concept drift. In this paper, we investigate the binary classification problem by rebalancing an imbalanced stream of data in the presence of concept drift, accessing one sample at a time. We propose an extensive comparative study of Continuous Synthetic Minority Oversampling Technique (C-SMOTE ), inspired by the popular sampling technique Smote, as a meta-strategy to pipeline with SML classification algorithms. We benchmark C-SMOTE pipelines on both synthetic and real data streams, containing different types of concept drifts, different imbalance levels, and different class distributions. We bring statistical evidence that models learnt with C-SMOTE pipelines improve the minority class performance concerning both the baseline models and the state-of-the-art methods. We also perform a sensitivity analysis to detect the C-SMOTE impact on the majority class performance for the three types of concept drift and several class distributions. Moreover, we show a computational cost analysis in terms of time and memory consumption. Highlights: There is a trade-off between the performances of the two classes. The performances of the minority class are, in most of the cases, increased. The gain in the minorityAbstract: Streaming Machine Learning (SML) studies algorithms that update their models, given an unbounded and often non-stationary flow of data performing a single pass. Online class imbalance learning is a branch of SML that combines the challenges of both class imbalance and concept drift. In this paper, we investigate the binary classification problem by rebalancing an imbalanced stream of data in the presence of concept drift, accessing one sample at a time. We propose an extensive comparative study of Continuous Synthetic Minority Oversampling Technique (C-SMOTE ), inspired by the popular sampling technique Smote, as a meta-strategy to pipeline with SML classification algorithms. We benchmark C-SMOTE pipelines on both synthetic and real data streams, containing different types of concept drifts, different imbalance levels, and different class distributions. We bring statistical evidence that models learnt with C-SMOTE pipelines improve the minority class performance concerning both the baseline models and the state-of-the-art methods. We also perform a sensitivity analysis to detect the C-SMOTE impact on the majority class performance for the three types of concept drift and several class distributions. Moreover, we show a computational cost analysis in terms of time and memory consumption. Highlights: There is a trade-off between the performances of the two classes. The performances of the minority class are, in most of the cases, increased. The gain in the minority class recall is bigger than the loss in the majority one. Time and RAM consumed are more than the state-of-the-art ones. … (more)
- Is Part Of:
- Expert systems with applications. Volume 196(2022)
- Journal:
- Expert systems with applications
- Issue:
- Volume 196(2022)
- Issue Display:
- Volume 196, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 196
- Issue:
- 2022
- Issue Sort Value:
- 2022-0196-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-06-15
- Subjects:
- Evolving Data Stream -- Streaming -- Concept drift -- Balancing
Expert systems (Computer science) -- Periodicals
Systèmes experts (Informatique) -- Périodiques
Electronic journals
006.33 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09574174 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.eswa.2022.116630 ↗
- Languages:
- English
- ISSNs:
- 0957-4174
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3842.004220
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 21012.xml