A hybrid data‐level sampling approach in learning from skewed user‐click data for click fraud detection in online advertising. Issue 2 (21st September 2022)
- Record Type:
- Journal Article
- Title:
- A hybrid data‐level sampling approach in learning from skewed user‐click data for click fraud detection in online advertising. Issue 2 (21st September 2022)
- Main Title:
- A hybrid data‐level sampling approach in learning from skewed user‐click data for click fraud detection in online advertising
- Authors:
- Sisodia, Deepti
Sisodia, Dilip Singh - Other Names:
- Yu Hui guestEditor.
- Abstract:
- Abstract: One of the challenging issues in user‐click data of online advertising is the uneven class distribution which biases classification models. Resampling the data is a popular choice for obtaining class balance. However, oversampling results in overfitting, whilst under‐sampling results in information loss. Moreover, enhancing separability between samples, where the classes overlap closer to the decision boundary, is another challenge, which requires a careful pruning of instances towards increasing the separability in data space. Therefore, in this work, a new hybrid data sampling algorithm SMOTEOSS is designed and evaluated, concatenating the synthetic minority oversampling technique (SMOTE) followed by one‐sided selection (OSS) to balance the class distribution. The working of SMOTEOSS is twofold‐ first, it oversamples the under‐represented class distribution using the SMOTE by generating synthetic instances. However, the generation of synthetic instances closer to the decision boundary directly influences the learning model's decision‐making. Utilising OSS, the proposed method then identifies TOMEKLINKS and eliminates the noisy majority instances whilst eliminating the redundant instances. The proposed method's effectiveness is validated on the FDMA 2012 dataset against 10 state‐of‐the‐art sampling methods utilising the gradient tree boosting learning model. To authenticate SMOTEOSS, a fair comparison is made by conducting experiments on other 10 benchmarkAbstract: One of the challenging issues in user‐click data of online advertising is the uneven class distribution which biases classification models. Resampling the data is a popular choice for obtaining class balance. However, oversampling results in overfitting, whilst under‐sampling results in information loss. Moreover, enhancing separability between samples, where the classes overlap closer to the decision boundary, is another challenge, which requires a careful pruning of instances towards increasing the separability in data space. Therefore, in this work, a new hybrid data sampling algorithm SMOTEOSS is designed and evaluated, concatenating the synthetic minority oversampling technique (SMOTE) followed by one‐sided selection (OSS) to balance the class distribution. The working of SMOTEOSS is twofold‐ first, it oversamples the under‐represented class distribution using the SMOTE by generating synthetic instances. However, the generation of synthetic instances closer to the decision boundary directly influences the learning model's decision‐making. Utilising OSS, the proposed method then identifies TOMEKLINKS and eliminates the noisy majority instances whilst eliminating the redundant instances. The proposed method's effectiveness is validated on the FDMA 2012 dataset against 10 state‐of‐the‐art sampling methods utilising the gradient tree boosting learning model. To authenticate SMOTEOSS, a fair comparison is made by conducting experiments on other 10 benchmark imbalanced datasets using 10‐fold cross‐validation. Performance is measured using average precision, recall, F1‐score, G‐mean, the area under curve (AUC) and reduction rate. Results showed that the designed hybrid methodology is an efficient alternative to existing sampling methods. The Wilcoxon signed‐rank test is employed to demonstrate significant differences amidst the proposed and conventional sampling algorithms. … (more)
- Is Part Of:
- Expert systems. Volume 40:Issue 2(2023)
- Journal:
- Expert systems
- Issue:
- Volume 40:Issue 2(2023)
- Issue Display:
- Volume 40, Issue 2 (2023)
- Year:
- 2023
- Volume:
- 40
- Issue:
- 2
- Issue Sort Value:
- 2023-0040-0002-0000
- Page Start:
- n/a
- Page End:
- n/a
- Publication Date:
- 2022-09-21
- Subjects:
- class imbalance -- click fraud -- data sampling -- hybrid sampling -- majority -- minority -- SMOTEOSS
Expert systems (Computer science)
006.33 - Journal URLs:
- http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)1468-0394 ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1111/exsy.13147 ↗
- Languages:
- English
- ISSNs:
- 0266-4720
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3842.004000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 25665.xml