LG MLMay 30, 2022

PAC Generalization via Invariant Representations

Advait Parulekar, Karthikeyan Shanmugam, Sanjay Shakkottai

arXiv:2205.15196v34.64 citationsh-index: 47

Originality Incremental advance

AI Analysis

This work addresses generalization in machine learning for scenarios with diverse training environments, offering theoretical guarantees for robustness to interventions, though it is incremental in extending PAC learning concepts to this domain.

The paper tackles the problem of learning invariant representations for out-of-distribution generalization in linear Structural Equation Models (SEMs), showing finite-sample guarantees with bounds that do not scale in ambient dimension under certain conditions.

One method for obtaining generalizable solutions to machine learning tasks when presented with diverse training environments is to find \textit{invariant representations} of the data. These are representations of the covariates such that the best model on top of the representation is invariant across training environments. In the context of linear Structural Equation Models (SEMs), invariant representations might allow us to learn models with out-of-distribution guarantees, i.e., models that are robust to interventions in the SEM. To address the invariant representation problem in a {\em finite sample} setting, we consider the notion of $ε$-approximate invariance. We study the following question: If a representation is approximately invariant with respect to a given number of training interventions, will it continue to be approximately invariant on a larger collection of unseen SEMs? This larger collection of SEMs is generated through a parameterized family of interventions. Inspired by PAC learning, we obtain finite-sample out-of-distribution generalization guarantees for approximate invariance that holds \textit{probabilistically} over a family of linear SEMs without faithfulness assumptions. Our results show bounds that do not scale in ambient dimension when intervention sites are restricted to lie in a constant size subset of in-degree bounded nodes. We also show how to extend our results to a linear indirect observation model that incorporates latent variables.

View on arXiv PDF

Similar