LGJun 12, 2025

Sampling Imbalanced Data with Multi-objective Bilevel Optimization

Karen Medlin, Sven Leyffer, Krishnan Raghavan

arXiv:2506.11315v2h-index: 3

Originality Highly original

AI Analysis

This addresses classification challenges in imbalanced datasets, offering a novel approach with measurable gains.

The paper tackles the problem of imbalanced two-class classification by introducing a multi-objective bilevel optimization framework for data sampling, which improves F1 scores by 1-15% through enhanced diversity.

Two-class classification problems are often characterized by an imbalance between the number of majority and minority datapoints resulting in poor classification of the minority class in particular. Traditional approaches, such as reweighting the loss function or naïve resampling, risk overfitting and subsequently fail to improve classification because they do not consider the diversity between majority and minority datasets. Such consideration is infeasible because there is no metric that can measure the impact of imbalance on the model. To obviate these challenges, we make two key contributions. First, we introduce MOODS~(Multi-Objective Optimization for Data Sampling), a novel multi-objective bilevel optimization framework that guides both synthetic oversampling and majority undersampling. Second, we introduce a validation metric -- `$ε/ δ$ non-overlapping diversification metric' -- that quantifies the goodness of a sampling method towards model performance. With this metric we experimentally demonstrate state-of-the-art performance with improvement in diversity driving a $1-15 \%$ increase in $F1$ scores.

View on arXiv PDF

Similar