Prediction Bottlenecks Don't Discover Causal Structure (But Here's What They Actually Do)
For researchers using prediction bottlenecks for causal discovery, the paper shows the method does not discover causal structure beyond simpler baselines, providing a benchmark to test such claims.
The paper tests the claim that Mamba state-space models trained for next-step prediction recover Granger-causal structure via a simple readout, finding it does not survive falsification: a linear bottleneck performs as well, tuned Lasso beats it, and the intervention advantage is largely a sample-size confound. The lasting artifact is a reusable falsification benchmark.
A Mamba state-space model trained only for next-step prediction appears to recover Granger-causal structure through a simple readout $S = |W_{out} W_{in}|$, with early experiments suggesting the phenomenon generalized across architectures and benefited from interventional data at $p < 10^{-5}$. We package the protocol used to test that claim -- standardized synthetic generators (VAR/Lorenz/CauseMe-style), three intervention semantics ($do(X=c)$, soft-noise, random-forcing), edge-provenance cards on three real datasets, and size-matched control arms -- as a reusable falsification benchmark, and walk the claim through it in five stages. The method-level claim does not survive: (i) a plain linear bottleneck does as well or better; (ii) tuned Lasso beats the bottleneck on synthetic CauseMe-style benchmarks, and on Lorenz-96 (the only real benchmark with unambiguous ground truth) classical PCMCI and Granger lead a tight cluster in which the bottleneck trails; (iii) the headline intervention advantage is roughly 60% a sample-size confound, and the residual disappears under standard $do(X=c)$ interventions, surviving only under a non-standard random-forcing scheme; (iv) even that residual reproduces, with a larger effect, in classical bivariate Granger -- the effect is method-agnostic. What survives is a narrow characterization result; the benchmark is the lasting artifact, and each stage above is one of its control arms.