LGSTDec 1, 2022

High Dimensional Binary Classification under Label Shift: Phase Transition and Regularization

Georgia Tech
arXiv:2212.00700v31 citationsh-index: 51
Originality Incremental advance
AI Analysis

This work addresses label shift issues for machine learning practitioners in high-dimensional settings, providing theoretical insights that challenge conventional balancing methods, though it is incremental as it builds on existing Fisher Linear Discriminant analysis.

The paper tackles the problem of label shift in high-dimensional binary classification, showing that in overparametrized regimes, classifiers trained on imbalanced data can outperform those on balanced data, with a phase transition phenomenon that disappears under strong regularization.

Label Shift has been widely believed to be harmful to the generalization performance of machine learning models. Researchers have proposed many approaches to mitigate the impact of the label shift, e.g., balancing the training data. However, these methods often consider the underparametrized regime, where the sample size is much larger than the data dimension. The research under the overparametrized regime is very limited. To bridge this gap, we propose a new asymptotic analysis of the Fisher Linear Discriminant classifier for binary classification with label shift. Specifically, we prove that there exists a phase transition phenomenon: Under certain overparametrized regime, the classifier trained using imbalanced data outperforms the counterpart with reduced balanced data. Moreover, we investigate the impact of regularization to the label shift: The aforementioned phase transition vanishes as the regularization becomes strong.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes