LGAICVJun 10, 2022

Causal Balancing for Domain Generalization

arXiv:2206.05263v432 citationsh-index: 63
Originality Incremental advance
AI Analysis

It addresses the problem of spurious correlations in machine learning models for domain generalization, with incremental improvements in a specific benchmark.

The paper tackles out-of-domain generalization by proposing a balanced mini-batch sampling strategy to transform biased data into a spurious-free distribution, achieving the best performance on DomainBed across 20 baselines.

While machine learning models rapidly advance the state-of-the-art on various real-world tasks, out-of-domain (OOD) generalization remains a challenging problem given the vulnerability of these models to spurious correlations. We propose a balanced mini-batch sampling strategy to transform a biased data distribution into a spurious-free balanced distribution, based on the invariance of the underlying causal mechanisms for the data generation process. We argue that the Bayes optimal classifiers trained on such balanced distribution are minimax optimal across a diverse enough environment space. We also provide an identifiability guarantee of the latent variable model of the proposed data generation process, when utilizing enough train environments. Experiments are conducted on DomainBed, demonstrating empirically that our method obtains the best performance across 20 baselines reported on the benchmark.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes