FW-SMOTE: A feature-weighted oversampling approach for imbalanced classification. (April 2022)
- Record Type:
- Journal Article
- Title:
- FW-SMOTE: A feature-weighted oversampling approach for imbalanced classification. (April 2022)
- Main Title:
- FW-SMOTE: A feature-weighted oversampling approach for imbalanced classification
- Authors:
- Maldonado, Sebastián
Vairetti, Carla
Fernandez, Alberto
Herrera, Francisco - Abstract:
- Highlights: We claim that SMOTE has a weakness when facing high-dimensional problems. We propose a general version of the SMOTE strategy using OWA operators. The proposal includes a feature weighting process that considers relevancy/redundancy. This new component leads to a better definition of the neighborhood of minority samples. Experiments carried out on 42 datasets show the virtues of our method. Abstract: The Synthetic Minority Over-sampling Technique (SMOTE) is a well-known resampling strategy that has been successfully used for dealing with the class-imbalance problem, one of the most challenging pattern recognition tasks in the last two decades. In this work, we claim that SMOTE has an important issue when defining the neighborhood in order to create new minority samples: the use of the Euclidean distance may not be suitable in high-dimensional settings. Our hypothesis is that the use of a weighted metric that does not assume that all features are equally important could improve performance in the presence of noisy/redundant variables. In this line, we present a novel SMOTE-like method that uses the weighted Minkowski distance for defining the neighborhood for each example of the minority class. This methodology leads to a better definition of the neighborhood since it prioritizes those features that are more relevant for the classification task. A complementary advantage of the proposal is performing feature selection since attributes can be discarded when theirHighlights: We claim that SMOTE has a weakness when facing high-dimensional problems. We propose a general version of the SMOTE strategy using OWA operators. The proposal includes a feature weighting process that considers relevancy/redundancy. This new component leads to a better definition of the neighborhood of minority samples. Experiments carried out on 42 datasets show the virtues of our method. Abstract: The Synthetic Minority Over-sampling Technique (SMOTE) is a well-known resampling strategy that has been successfully used for dealing with the class-imbalance problem, one of the most challenging pattern recognition tasks in the last two decades. In this work, we claim that SMOTE has an important issue when defining the neighborhood in order to create new minority samples: the use of the Euclidean distance may not be suitable in high-dimensional settings. Our hypothesis is that the use of a weighted metric that does not assume that all features are equally important could improve performance in the presence of noisy/redundant variables. In this line, we present a novel SMOTE-like method that uses the weighted Minkowski distance for defining the neighborhood for each example of the minority class. This methodology leads to a better definition of the neighborhood since it prioritizes those features that are more relevant for the classification task. A complementary advantage of the proposal is performing feature selection since attributes can be discarded when their corresponding weights are below a given threshold. Our experiments on 42 class-imbalance datasets show the virtues of the proposed SMOTE variant, achieving the best predictive performance when compared with the traditional SMOTE approach and other recent variants on low- and high-dimensional settings, handling issues such as class overlap and hubness adequately without increasing the complexity of the method. … (more)
- Is Part Of:
- Pattern recognition. Volume 124(2022)
- Journal:
- Pattern recognition
- Issue:
- Volume 124(2022)
- Issue Display:
- Volume 124, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 124
- Issue:
- 2022
- Issue Sort Value:
- 2022-0124-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-04
- Subjects:
- Data resampling -- SMOTE -- OWA Operators -- Feature selection -- Imbalanced data classification
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2021.108511 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 22256.xml