CLEAR: Can Language Models Really Understand Causal Graphs?
This work addresses the problem of evaluating causal reasoning in language models for researchers in AI and cognitive science, but it is incremental as it builds on existing benchmarks and methods.
The authors investigated whether language models can understand causal graphs by developing a framework and a benchmark called CLEAR, which includes 20 tasks across three complexity levels, and found that while models show preliminary understanding, there is significant room for improvement.
Causal reasoning is a cornerstone of how humans interpret the world. To model and reason about causality, causal graphs offer a concise yet effective solution. Given the impressive advancements in language models, a crucial question arises: can they really understand causal graphs? To this end, we pioneer an investigation into language models' understanding of causal graphs. Specifically, we develop a framework to define causal graph understanding, by assessing language models' behaviors through four practical criteria derived from diverse disciplines (e.g., philosophy and psychology). We then develop CLEAR, a novel benchmark that defines three complexity levels and encompasses 20 causal graph-based tasks across these levels. Finally, based on our framework and benchmark, we conduct extensive experiments on six leading language models and summarize five empirical findings. Our results indicate that while language models demonstrate a preliminary understanding of causal graphs, significant potential for improvement remains. Our project website is at https://github.com/OpenCausaLab/CLEAR.