CLMay 28, 2025

Climate Finance Bench

arXiv:2505.22752v1
Originality Synthesis-oriented
AI Analysis

This provides a domain-specific benchmark for climate finance applications, but it is incremental as it builds on existing RAG methods.

The authors tackled the problem of evaluating question-answering over corporate climate disclosures by creating an open benchmark with 330 expert-validated question-answer pairs from 33 sustainability reports, and found that the retriever's ability to locate relevant passages is the main performance bottleneck.

Climate Finance Bench introduces an open benchmark that targets question-answering over corporate climate disclosures using Large Language Models. We curate 33 recent sustainability reports in English drawn from companies across all 11 GICS sectors and annotate 330 expert-validated question-answer pairs that span pure extraction, numerical reasoning, and logical reasoning. Building on this dataset, we propose a comparison of RAG (retrieval-augmented generation) approaches. We show that the retriever's ability to locate passages that actually contain the answer is the chief performance bottleneck. We further argue for transparent carbon reporting in AI-for-climate applications, highlighting advantages of techniques such as Weight Quantization.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes