LG MLDec 1, 2021

Learning Invariant Representations with Missing Data

Mark Goldstein, Jörn-Henrik Jacobsen, Olina Chau, Adriel Saporta, Aahlad Puli, Rajesh Ranganath, Andrew C. Miller

arXiv:2112.00881v25.55 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the issue of missing nuisance variables in machine learning models for researchers and practitioners, offering a method to maintain test performance without full data, though it is incremental as it builds on existing invariance frameworks.

The paper tackled the problem of learning invariant representations when nuisance variables are missing during training, which can lead to poor test performance due to spurious correlations. The result showed that their derived MMD estimators achieved test performance similar to using full data on simulations and clinical data.

Spurious correlations allow flexible models to predict well during training but poorly on related test distributions. Recent work has shown that models that satisfy particular independencies involving correlation-inducing \textit{nuisance} variables have guarantees on their test performance. Enforcing such independencies requires nuisances to be observed during training. However, nuisances, such as demographics or image background labels, are often missing. Enforcing independence on just the observed data does not imply independence on the entire population. Here we derive \acrshort{mmd} estimators used for invariance objectives under missing nuisances. On simulations and clinical data, optimizing through these estimates achieves test performance similar to using estimators that make use of the full data.

View on arXiv PDF Code

Similar