Imbalanced Classification via a Tabular Translation GAN
This addresses class imbalance in tabular data for predictive modeling, but it is incremental as it builds on existing GAN and oversampling techniques.
The paper tackles binary classification with severe class imbalance by using a GAN-based model with regularization losses to translate majority samples into synthetic minority samples near the class boundary, improving average precision over alternative methods on tabular datasets.
When presented with a binary classification problem where the data exhibits severe class imbalance, most standard predictive methods may fail to accurately model the minority class. We present a model based on Generative Adversarial Networks which uses additional regularization losses to map majority samples to corresponding synthetic minority samples. This translation mechanism encourages the synthesized samples to be close to the class boundary. Furthermore, we explore a selection criterion to retain the most useful of the synthesized samples. Experimental results using several downstream classifiers on a variety of tabular class-imbalanced datasets show that the proposed method improves average precision when compared to alternative re-weighting and oversampling techniques.