A Fine-Grained Analysis on Distribution Shift
This work addresses the critical need for robust models in real-world deployments by providing a comprehensive evaluation framework, though it is incremental in refining existing analysis methods.
The paper tackles the problem of evaluating machine learning model robustness to distribution shifts by introducing a framework for fine-grained analysis, finding that pretraining and augmentations offer significant gains over a standard baseline, with over 85,000 models trained across 19 methods.
Robustness to distribution shifts is critical for deploying machine learning models in the real world. Despite this necessity, there has been little work in defining the underlying mechanisms that cause these shifts and evaluating the robustness of algorithms across multiple, different distribution shifts. To this end, we introduce a framework that enables fine-grained analysis of various distribution shifts. We provide a holistic analysis of current state-of-the-art methods by evaluating 19 distinct methods grouped into five categories across both synthetic and real-world datasets. Overall, we train more than 85K models. Our experimental framework can be easily extended to include new methods, shifts, and datasets. We find, unlike previous work~\citep{Gulrajani20}, that progress has been made over a standard ERM baseline; in particular, pretraining and augmentations (learned or heuristic) offer large gains in many cases. However, the best methods are not consistent over different datasets and shifts.