LGAIAug 29, 2023

From SMOTE to Mixup for Deep Imbalanced Classification

arXiv:2308.15457v211 citationsh-index: 79Has Code
Originality Highly original
AI Analysis

This addresses the challenge of training effective deep learning classifiers on imbalanced datasets, which is common in real-world applications like medical diagnosis or fraud detection, and is incremental by building on existing methods like SMOTE and Mixup.

The paper tackled the problem of poor generalization for minority classes in deep imbalanced classification by proposing a margin-aware Mixup technique, which achieved state-of-the-art performance and superior results on extremely imbalanced data.

Given imbalanced data, it is hard to train a good classifier using deep learning because of the poor generalization of minority classes. Traditionally, the well-known synthetic minority oversampling technique (SMOTE) for data augmentation, a data mining approach for imbalanced learning, has been used to improve this generalization. However, it is unclear whether SMOTE also benefits deep learning. In this work, we study why the original SMOTE is insufficient for deep learning, and enhance SMOTE using soft labels. Connecting the resulting soft SMOTE with Mixup, a modern data augmentation technique, leads to a unified framework that puts traditional and modern data augmentation techniques under the same umbrella. A careful study within this framework shows that Mixup improves generalization by implicitly achieving uneven margins between majority and minority classes. We then propose a novel margin-aware Mixup technique that more explicitly achieves uneven margins. Extensive experimental results demonstrate that our proposed technique yields state-of-the-art performance on deep imbalanced classification while achieving superior performance on extremely imbalanced data. The code is open-sourced in our developed package https://github.com/ntucllab/imbalanced-DL to foster future research in this direction.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes