Conditional Wasserstein GAN-based oversampling of tabular data for imbalanced learning. (15th July 2021)
- Record Type:
- Journal Article
- Title:
- Conditional Wasserstein GAN-based oversampling of tabular data for imbalanced learning. (15th July 2021)
- Main Title:
- Conditional Wasserstein GAN-based oversampling of tabular data for imbalanced learning
- Authors:
- Engelmann, Justin
Lessmann, Stefan - Abstract:
- Highlights: We design a tabular data GAN for oversampling that can handle categorical variables. We assess our GAN in a credit scoring setting using multiple real-world datasets. We find GAN-based oversampling to outperform advanced SMOTE-type benchmarks. Ablations confirm the specific choices in the proposed GAN architecture. Abstract: Class imbalance impedes the predictive performance of classification models. Popular countermeasures include oversampling minority class cases by creating synthetic examples. The paper examines the potential of Generative Adversarial Networks (GANs) for oversampling. A few prior studies have used GANs for this purpose but do not reflect recent methodological advancements for generating tabular data using GANs. The paper proposes an approach based on a conditional Wasserstein GAN that can effectively model tabular datasets with numerical and categorical variables and pays special attention to the down-stream classification task through an auxiliary classifier loss. We focus on a credit scoring context in which binary classifiers predict the default risk of loan applications. Empirical comparisons in this context evidence the competitiveness of GAN-based oversampling compared to several standard oversampling regimes. We also clarify the conditions under which oversampling in general and the proposed GAN-based approach in particular raise predictive performance. In sum, our findings suggest that GAN architectures for tabular data and ourHighlights: We design a tabular data GAN for oversampling that can handle categorical variables. We assess our GAN in a credit scoring setting using multiple real-world datasets. We find GAN-based oversampling to outperform advanced SMOTE-type benchmarks. Ablations confirm the specific choices in the proposed GAN architecture. Abstract: Class imbalance impedes the predictive performance of classification models. Popular countermeasures include oversampling minority class cases by creating synthetic examples. The paper examines the potential of Generative Adversarial Networks (GANs) for oversampling. A few prior studies have used GANs for this purpose but do not reflect recent methodological advancements for generating tabular data using GANs. The paper proposes an approach based on a conditional Wasserstein GAN that can effectively model tabular datasets with numerical and categorical variables and pays special attention to the down-stream classification task through an auxiliary classifier loss. We focus on a credit scoring context in which binary classifiers predict the default risk of loan applications. Empirical comparisons in this context evidence the competitiveness of GAN-based oversampling compared to several standard oversampling regimes. We also clarify the conditions under which oversampling in general and the proposed GAN-based approach in particular raise predictive performance. In sum, our findings suggest that GAN architectures for tabular data and our extensions deserve a place in data scientists' modelling toolbox. … (more)
- Is Part Of:
- Expert systems with applications. Volume 174(2021)
- Journal:
- Expert systems with applications
- Issue:
- Volume 174(2021)
- Issue Display:
- Volume 174, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 174
- Issue:
- 2021
- Issue Sort Value:
- 2021-0174-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-07-15
- Subjects:
- Imbalanced learning -- Generative adversarial networks -- Credit scoring -- Oversampling
Expert systems (Computer science) -- Periodicals
Systèmes experts (Informatique) -- Périodiques
Electronic journals
006.33 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09574174 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.eswa.2021.114582 ↗
- Languages:
- English
- ISSNs:
- 0957-4174
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3842.004220
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 24940.xml