The balancing trick: Optimized sampling of imbalanced datasets—A brief survey of the recent State of the Art. Issue 4 (15th October 2020)
- Record Type:
- Journal Article
- Title:
- The balancing trick: Optimized sampling of imbalanced datasets—A brief survey of the recent State of the Art. Issue 4 (15th October 2020)
- Main Title:
- The balancing trick: Optimized sampling of imbalanced datasets—A brief survey of the recent State of the Art
- Authors:
- Susan, Seba
Kumar, Amitesh - Abstract:
- Abstract: This survey paper focuses on one of the current primary issues challenging data mining researchers experimenting on real‐world datasets. The problem is that of imbalanced class distribution that generates a bias toward the majority class due to insufficient training samples from the minority class. The current machine learning and deep learning algorithms are trained on datasets that are insufficiently represented in certain categories. On the other hand, some other classes have surplus samples due to the ready availability of data from these categories. Conventional solutions suggest undersampling of the majority class and/or oversampling of the minority class for balancing the class distribution prior to the learning phase. Though this problem of uneven class distribution is, by and large, ignored by researchers focusing on the learning technology, a need has now arisen for incorporating balance correction and data pruning procedures within the learning process itself. This paper surveys a plethora of conventional and recent techniques that address this issue through intelligent representations of samples from the majority and minority classes, that are given as input to the learning module. The application of nature‐inspired evolutionary algorithms to intelligent sampling is examined, and so are hybrid sampling strategies that select and retain the difficult‐to‐learn samples and discard the easy‐to‐learn samples. The findings by various researchers areAbstract: This survey paper focuses on one of the current primary issues challenging data mining researchers experimenting on real‐world datasets. The problem is that of imbalanced class distribution that generates a bias toward the majority class due to insufficient training samples from the minority class. The current machine learning and deep learning algorithms are trained on datasets that are insufficiently represented in certain categories. On the other hand, some other classes have surplus samples due to the ready availability of data from these categories. Conventional solutions suggest undersampling of the majority class and/or oversampling of the minority class for balancing the class distribution prior to the learning phase. Though this problem of uneven class distribution is, by and large, ignored by researchers focusing on the learning technology, a need has now arisen for incorporating balance correction and data pruning procedures within the learning process itself. This paper surveys a plethora of conventional and recent techniques that address this issue through intelligent representations of samples from the majority and minority classes, that are given as input to the learning module. The application of nature‐inspired evolutionary algorithms to intelligent sampling is examined, and so are hybrid sampling strategies that select and retain the difficult‐to‐learn samples and discard the easy‐to‐learn samples. The findings by various researchers are summarized to a logical end, and various possibilities and challenges for future directions in research are outlined. Abstract : This paper surveys recent sampling techniques addressing the class‐imbalance issue. The application of nature‐inspired evolutionary optimization techniques to intelligent sampling is examined and so are hybrid sampling strategies that select and retain the difficult‐to‐learn samples and discard the easy‐to‐learn samples. The findings by various researchers are summarized to a logical end, and various possibilities for the future are outlined. … (more)
- Is Part Of:
- Engineering reports. Volume 3:Issue 4(2021)
- Journal:
- Engineering reports
- Issue:
- Volume 3:Issue 4(2021)
- Issue Display:
- Volume 3, Issue 4 (2021)
- Year:
- 2021
- Volume:
- 3
- Issue:
- 4
- Issue Sort Value:
- 2021-0003-0004-0000
- Page Start:
- n/a
- Page End:
- n/a
- Publication Date:
- 2020-10-15
- Subjects:
- class‐imbalance problem -- hybrid sampling -- imbalanced data -- oversampling -- sampling -- undersampling
Engineering -- Periodicals
Computer science -- Periodicals
620.005 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
https://onlinelibrary.wiley.com/loi/25778196 ↗ - DOI:
- 10.1002/eng2.12298 ↗
- Languages:
- English
- ISSNs:
- 2577-8196
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 16546.xml