LG AI CROct 27, 2021

Simple data balancing achieves competitive worst-group-accuracy

Badr Youbi Idrissi, Martin Arjovsky, Mohammad Pezeshki, David Lopez-Paz

arXiv:2110.14503v233.3236 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of improving worst-group accuracy in machine learning, but it is incremental as it highlights that simple methods can match or outperform complex state-of-the-art approaches.

The paper tackled the problem of learning classifiers that perform well across known or unknown data groups by comparing state-of-the-art methods to simple data balancing techniques like subsampling or reweighting. The result showed that these baselines achieve state-of-the-art worst-group accuracy, are faster to train, and require no additional hyper-parameters.

We study the problem of learning classifiers that perform well across (known or unknown) groups of data. After observing that common worst-group-accuracy datasets suffer from substantial imbalances, we set out to compare state-of-the-art methods to simple balancing of classes and groups by either subsampling or reweighting data. Our results show that these data balancing baselines achieve state-of-the-art-accuracy, while being faster to train and requiring no additional hyper-parameters. In addition, we highlight that access to group information is most critical for model selection purposes, and not so much during training. All in all, our findings beg closer examination of benchmarks and methods for research in worst-group-accuracy optimization.

View on arXiv PDF Code

Similar