LGMLAug 21, 2023

Spurious Correlations and Where to Find Them

arXiv:2308.11043v18 citationsh-index: 33
Originality Synthesis-oriented
AI Analysis

This work addresses the issue of unreliable model features for machine learning practitioners, but it appears incremental as it builds on existing hypotheses without introducing a new method.

The paper tackles the problem of spurious correlations in data-driven learning by investigating common hypotheses behind their occurrence, using synthetic datasets from causal graphs to observe patterns connecting these hypotheses and model design choices.

Spurious correlations occur when a model learns unreliable features from the data and are a well-known drawback of data-driven learning. Although there are several algorithms proposed to mitigate it, we are yet to jointly derive the indicators of spurious correlations. As a result, the solutions built upon standalone hypotheses fail to beat simple ERM baselines. We collect some of the commonly studied hypotheses behind the occurrence of spurious correlations and investigate their influence on standard ERM baselines using synthetic datasets generated from causal graphs. Subsequently, we observe patterns connecting these hypotheses and model design choices.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes