Heads or Tails: A Simple Example of Causal Abstractive Simulation
This work provides a formal foundation for language model simulation, connecting statistical benchmarking to causality, which is incremental but useful for practitioners, philosophers, and mathematicians in AI.
The paper formalizes causal abstractive simulation to analyze language models simulating a fair coin toss, demonstrating both failure cases and a success case where the formalism proves a model can simulate a system given its causal description.
This note illustrates how a variety of causal abstraction arXiv:1707.00819 arXiv:1812.03789, defined here as causal abstractive simulation, can be used to formalize a simple example of language model simulation. This note considers the case of simulating a fair coin toss with a language model. Examples are presented illustrating the ways language models can fail to simulate, and a success case is presented, illustrating how this formalism may be used to prove that a language model simulates some other system, given a causal description of the system. This note may be of interest to three groups. For practitioners in the growing field of language model simulation, causal abstractive simulation is a means to connect ad-hoc statistical benchmarking practices to the solid formal foundation of causality. Philosophers of AI and philosophers of mind may be interested as causal abstractive simulation gives a precise operationalization to the idea that language models are role-playing arXiv:2402.12422. Mathematicians and others working on causal abstraction may be interested to see a new application of the core ideas that yields a new variation of causal abstraction.