LG MLJul 19, 2022

Assaying Out-Of-Distribution Generalization in Transfer Learning

Florian Wenzel, Andrea Dittadi, Peter Vincent Gehler, Carl-Johann Simon-Gabriel, Max Horn, Dominik Zietlow, David Kernert, Chris Russell, Thomas Brox, Bernt Schiele, Bernhard Schölkopf, Francesco Locatello

ETH Zurich

arXiv:2207.09239v228.291 citationsh-index: 169Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of robust model evaluation for researchers and practitioners, though it is incremental as it synthesizes and tests existing approaches rather than introducing new methods.

The paper tackled the problem of out-of-distribution generalization in transfer learning by empirically testing various proxy targets under unified conditions, finding that the relationship between in- and out-of-distribution accuracies is dataset-dependent and more complex than previously thought, based on fine-tuning over 31k networks across 172 dataset pairs.

Since out-of-distribution generalization is a generally ill-posed problem, various proxy targets (e.g., calibration, adversarial robustness, algorithmic corruptions, invariance across shifts) were studied across different research programs resulting in different recommendations. While sharing the same aspirational goal, these approaches have never been tested under the same experimental conditions on real data. In this paper, we take a unified view of previous work, highlighting message discrepancies that we address empirically, and providing recommendations on how to measure the robustness of a model and how to improve it. To this end, we collect 172 publicly available dataset pairs for training and out-of-distribution evaluation of accuracy, calibration error, adversarial attacks, environment invariance, and synthetic corruptions. We fine-tune over 31k networks, from nine different architectures in the many- and few-shot setting. Our findings confirm that in- and out-of-distribution accuracies tend to increase jointly, but show that their relation is largely dataset-dependent, and in general more nuanced and more complex than posited by previous, smaller scale studies.

View on arXiv PDF Code

Similar