LGJun 8, 2021

Towards a Theoretical Framework of Out-of-Distribution Generalization

arXiv:2106.04496v3148 citations
AI Analysis

This work addresses a central problem in machine learning by providing a theoretical framework for OOD generalization, which is incremental as it builds on existing ideas of invariant features.

The paper tackles the problem of out-of-distribution (OOD) generalization by proposing rigorous definitions for OOD and learnability, and introduces an expansion function to quantify invariant features, proving generalization error bounds and a model selection criterion that shows significant advantages in experiments on benchmark datasets.

Generalization to out-of-distribution (OOD) data is one of the central problems in modern machine learning. Recently, there is a surge of attempts to propose algorithms that mainly build upon the idea of extracting invariant features. Although intuitively reasonable, theoretical understanding of what kind of invariance can guarantee OOD generalization is still limited, and generalization to arbitrary out-of-distribution is clearly impossible. In this work, we take the first step towards rigorous and quantitative definitions of 1) what is OOD; and 2) what does it mean by saying an OOD problem is learnable. We also introduce a new concept of expansion function, which characterizes to what extent the variance is amplified in the test domains over the training domains, and therefore give a quantitative meaning of invariant features. Based on these, we prove OOD generalization error bounds. It turns out that OOD generalization largely depends on the expansion function. As recently pointed out by Gulrajani and Lopez-Paz (2020), any OOD learning algorithm without a model selection module is incomplete. Our theory naturally induces a model selection criterion. Extensive experiments on benchmark OOD datasets demonstrate that our model selection criterion has a significant advantage over baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes