CLAILGAPMEMLNov 12, 2024

Language Models as Causal Effect Generators

arXiv:2411.08019v24 citationsh-index: 6
Originality Incremental advance
AI Analysis

This work addresses the need for better benchmarks in causal inference, particularly for evaluating treatment effect estimation methods, though it is incremental as it builds on existing causal modeling frameworks.

The authors tackled the problem of generating realistic counterfactual data for causal inference by introducing sequence-driven structural causal models (SD-SCMs), which use language models to define mechanisms and enable sampling from various causal distributions. They created a benchmark with thousands of datasets, finding that causal methods outperform non-causal ones but even state-of-the-art methods struggle with individualized effect estimation.

In this work, we present sequence-driven structural causal models (SD-SCMs), a framework for specifying causal models with user-defined structure and language-model-defined mechanisms. We characterize how an SD-SCM enables sampling from observational, interventional, and counterfactual distributions according to the desired causal structure. We then leverage this procedure to propose a new type of benchmark for causal inference methods, generating individual-level counterfactual data to test treatment effect estimation. We create an example benchmark consisting of thousands of datasets, and test a suite of popular estimation methods for average, conditional average, and individual treatment effect estimation. We find under this benchmark that (1) causal methods outperform non-causal methods and that (2) even state-of-the-art methods struggle with individualized effect estimation, suggesting this benchmark captures some inherent difficulties in causal estimation. Apart from generating data, this same technique can underpin the auditing of language models for (un)desirable causal effects, such as misinformation or discrimination. We believe SD-SCMs can serve as a useful tool in any application that would benefit from sequential data with controllable causal structure.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes