LGCVMLMay 29, 2023

On Counterfactual Data Augmentation Under Confounding

arXiv:2305.18183v24 citations
Originality Incremental advance
AI Analysis

This addresses the problem of spurious correlations in machine learning for researchers and practitioners, though it appears incremental as it builds on existing counterfactual augmentation methods.

The paper tackles the problem of confounding biases in training data by analyzing their impact on classifiers and proposing counterfactual data augmentation as a solution, demonstrating through experiments on MNIST variants and CelebA that their simple augmentation method helps existing state-of-the-art methods achieve good results.

Counterfactual data augmentation has recently emerged as a method to mitigate confounding biases in the training data. These biases, such as spurious correlations, arise due to various observed and unobserved confounding variables in the data generation process. In this paper, we formally analyze how confounding biases impact downstream classifiers and present a causal viewpoint to the solutions based on counterfactual data augmentation. We explore how removing confounding biases serves as a means to learn invariant features, ultimately aiding in generalization beyond the observed data distribution. Additionally, we present a straightforward yet powerful algorithm for generating counterfactual images, which effectively mitigates the influence of confounding effects on downstream classifiers. Through experiments on MNIST variants and the CelebA datasets, we demonstrate how our simple augmentation method helps existing state-of-the-art methods achieve good results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes