Generative Interventions for Causal Learning
This work addresses the problem of out-of-distribution generalization for computer vision models, which often fail due to learning spurious correlations, benefiting researchers and practitioners building robust visual AI systems.
This paper introduces a framework that uses generative models to create interventions on features caused by confounding factors, aiming to learn robust visual representations that generalize across different viewpoints, backgrounds, and scene contexts. The method achieves state-of-the-art performance when generalizing from ImageNet to ObjectNet.
We introduce a framework for learning robust visual representations that generalize to new viewpoints, backgrounds, and scene contexts. Discriminative models often learn naturally occurring spurious correlations, which cause them to fail on images outside of the training distribution. In this paper, we show that we can steer generative models to manufacture interventions on features caused by confounding factors. Experiments, visualizations, and theoretical results show this method learns robust representations more consistent with the underlying causal relationships. Our approach improves performance on multiple datasets demanding out-of-distribution generalization, and we demonstrate state-of-the-art performance generalizing from ImageNet to ObjectNet dataset.