ClimateCause: Complex and Implicit Causal Structures in Climate Reports
For researchers in causal discovery and NLP, this dataset provides a new resource for studying implicit and nested causality in a domain where reasoning over causal networks is critical.
The authors introduce ClimateCause, a manually annotated dataset of complex causal structures from climate reports, and show that large language models struggle with causal chain reasoning, which remains a key challenge.
Understanding climate change requires reasoning over complex causal networks. Yet, existing causal discovery datasets predominantly capture explicit, direct causal relations. We introduce ClimateCause, a manually expert-annotated dataset of higher-order causal structures from science-for-policy climate reports, including implicit and nested causality. Cause-effect expressions are normalized and disentangled into individual causal relations to facilitate graph construction, with unique annotations for cause-effect correlation, relation type, and spatiotemporal context. We further demonstrate ClimateCause's value for quantifying readability based on the semantic complexity of causal graphs underlying a statement. Finally, large language model benchmarking on correlation inference and causal chain reasoning highlights the latter as a key challenge.