When resampling/reweighting improves feature learning in imbalanced classification?: A toy-model study
This provides theoretical insights for researchers in machine learning on imbalanced data, but it is incremental as it builds on prior findings.
The study uses a toy model to analyze when class-wise resampling or reweighting improves feature learning in imbalanced classification, finding cases where no resampling yields the best performance regardless of loss or classifier choice, with the key factor being symmetry in the loss and problem setting.
A toy model of binary classification is studied with the aim of clarifying the class-wise resampling/reweighting effect on the feature learning performance under the presence of class imbalance. In the analysis, a high-dimensional limit of the input space is taken while keeping the ratio of the dataset size against the input dimension finite and the non-rigorous replica method from statistical mechanics is employed. The result shows that there exists a case in which the no resampling/reweighting situation gives the best feature learning performance irrespectively of the choice of losses or classifiers, supporting recent findings in Cao et al. (2019); Kang et al. (2019). It is also revealed that the key of the result is the symmetry of the loss and the problem setting. Inspired by this, we propose a further simplified model exhibiting the same property in the multiclass setting. These clarify when the class-wise resampling/reweighting becomes effective in imbalanced classification.