Exploratory Causal Inference in SAEnce
This addresses the challenge of scaling causal inference for researchers in fields like experimental ecology, though it appears incremental as it builds on existing methods like foundation models and sparse autoencoders.
The paper tackled the problem of discovering unknown causal effects from unstructured trial data without relying on hand-crafted hypotheses, achieving the first successful unsupervised causal effect identification in a real-world scientific trial.
Randomized Controlled Trials are one of the pillars of science; nevertheless, they rely on hand-crafted hypotheses and expensive analysis. Such constraints prevent causal effect estimation at scale, potentially anchoring on popular yet incomplete hypotheses. We propose to discover the unknown effects of a treatment directly from data. For this, we turn unstructured data from a trial into meaningful representations via pretrained foundation models and interpret them via a sparse autoencoder. However, discovering significant causal effects at the neural level is not trivial due to multiple-testing issues and effects entanglement. To address these challenges, we introduce Neural Effect Search, a novel recursive procedure solving both issues by progressive stratification. After assessing the robustness of our algorithm on semi-synthetic experiments, we showcase, in the context of experimental ecology, the first successful unsupervised causal effect identification on a real-world scientific trial.