LGDec 8, 2021

The Effect of Model Size on Worst-Group Generalization

arXiv:2112.04094v17 citations
Originality Incremental advance
AI Analysis

This addresses the problem of poor generalization on rare subgroups for practitioners in machine learning, offering practical advice to use larger pre-trained models, though it is incremental as it builds on existing overparameterization studies.

The paper investigates how model size affects worst-group generalization when subgroup information is unknown, finding that increasing model size does not harm and may improve performance across various architectures, domains, and initializations, with concrete improvements on datasets like Waterbirds and MultiNLI.

Overparameterization is shown to result in poor test accuracy on rare subgroups under a variety of settings where subgroup information is known. To gain a more complete picture, we consider the case where subgroup information is unknown. We investigate the effect of model size on worst-group generalization under empirical risk minimization (ERM) across a wide range of settings, varying: 1) architectures (ResNet, VGG, or BERT), 2) domains (vision or natural language processing), 3) model size (width or depth), and 4) initialization (with pre-trained or random weights). Our systematic evaluation reveals that increasing model size does not hurt, and may help, worst-group test performance under ERM across all setups. In particular, increasing pre-trained model size consistently improves performance on Waterbirds and MultiNLI. We advise practitioners to use larger pre-trained models when subgroup labels are unknown.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes