CauSim: Scaling Causal Reasoning with Increasingly Complex Causal Simulators
For LLM researchers and practitioners, CauSim provides a method to overcome the scarcity of ground-truth causal reasoning data by generating scalable supervised training data.
CauSim transforms causal reasoning from a scarce-label problem into a scalable supervised one by constructing increasingly complex executable causal simulators, enabling data augmentation and supervision across representations. The framework demonstrates generalization, consistent gains from curriculum scaling and data volume, and LLM self-improvement through self-generated simulators.
Despite surpassing human performance across mathematics, coding, and other knowledge-intensive tasks, large language models (LLMs) continue to struggle with causal reasoning. A core obstacle is the target data itself: causal systems are complex and often expressed in non-executable forms, while ground-truth answers to causal queries are inherently scarce. We introduce CauSim, a framework that turns causal reasoning from a scarce-label problem into a scalable supervised one. CauSim constructs increasingly complex causal simulators: executable structural causal models (SCMs), incrementally built by LLMs, that scale to globally complex systems while maintaining verifiable answers to causal queries. CauSim operates across representations by formalizing non-executable causal knowledge into code, enabling data augmentation, and translating executable SCMs into natural language, enabling supervision in previously difficult-to-supervise representations. We structure our research into two parts: (1) how to construct increasingly complex causal simulators, and (2) a systematic study of what CauSim enables, demonstrating generalization across representations, consistent gains from curriculum scaling and data volume, LLM self-improvement through self-generated simulators, and data augmentation via formalization of existing domain knowledge.