Climate Finance Bench
This provides a domain-specific benchmark for climate finance applications, but it is incremental as it builds on existing RAG methods.
The authors tackled the problem of evaluating question-answering over corporate climate disclosures by creating an open benchmark with 330 expert-validated question-answer pairs from 33 sustainability reports, and found that the retriever's ability to locate relevant passages is the main performance bottleneck.
Climate Finance Bench introduces an open benchmark that targets question-answering over corporate climate disclosures using Large Language Models. We curate 33 recent sustainability reports in English drawn from companies across all 11 GICS sectors and annotate 330 expert-validated question-answer pairs that span pure extraction, numerical reasoning, and logical reasoning. Building on this dataset, we propose a comparison of RAG (retrieval-augmented generation) approaches. We show that the retriever's ability to locate passages that actually contain the answer is the chief performance bottleneck. We further argue for transparent carbon reporting in AI-for-climate applications, highlighting advantages of techniques such as Weight Quantization.