MLLGMEFeb 26, 2021

Beware of the Simulated DAG! Causal Discovery Benchmarks May Be Easy To Game

arXiv:2102.13647v3198 citationsHas Code
Originality Incremental advance
AI Analysis

This work exposes a critical flaw in causal discovery benchmarks, showing they may be easy to game, which is significant for researchers and practitioners relying on such evaluations, though it is incremental in highlighting a specific issue rather than proposing a new solution.

The paper reveals that simulated DAG benchmarks often have high varsortability, where marginal variance increases along the causal order, which can artificially boost the performance of causal discovery algorithms, but this advantage may not generalize to real-world data or standardized datasets.

Simulated DAG models may exhibit properties that, perhaps inadvertently, render their structure identifiable and unexpectedly affect structure learning algorithms. Here, we show that marginal variance tends to increase along the causal order for generically sampled additive noise models. We introduce varsortability as a measure of the agreement between the order of increasing marginal variance and the causal order. For commonly sampled graphs and model parameters, we show that the remarkable performance of some continuous structure learning algorithms can be explained by high varsortability and matched by a simple baseline method. Yet, this performance may not transfer to real-world data where varsortability may be moderate or dependent on the choice of measurement scales. On standardized data, the same algorithms fail to identify the ground-truth DAG or its Markov equivalence class. While standardization removes the pattern in marginal variance, we show that data generating processes that incur high varsortability also leave a distinct covariance pattern that may be exploited even after standardization. Our findings challenge the significance of generic benchmarks with independently drawn parameters. The code is available at https://github.com/Scriddie/Varsortability.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes