Climate AI for Corporate Decarbonization Metrics Extraction
This work addresses the labor-intensive process for sustainable investors and analysts by automating metric extraction, though it is incremental as it applies existing LLM methods to a new domain.
The paper tackles the problem of manually extracting corporate greenhouse gas emission targets from non-standardized sustainability disclosures by proposing an automated pipeline using large language models, which improves data collection efficiency and accuracy.
Corporate Greenhouse Gas (GHG) emission targets are important metrics in sustainable investing [12, 16]. To provide a comprehensive view of company emission objectives, we propose an approach to source these metrics from company public disclosures. Without automation, curating these metrics manually is a labor-intensive process that requires combing through lengthy corporate sustainability disclosures that often do not follow a standard format. Furthermore, the resulting dataset needs to be validated thoroughly by Subject Matter Experts (SMEs), further lengthening the time-to-market. We introduce the Climate Artificial Intelligence for Corporate Decarbonization Metrics Extraction (CAI) model and pipeline, a novel approach utilizing Large Language Models (LLMs) to extract and validate linked metrics from corporate disclosures. We demonstrate that the process improves data collection efficiency and accuracy by automating data curation, validation, and metric scoring from public corporate disclosures. We further show that our results are agnostic to the choice of LLMs. This framework can be applied broadly to information extraction from textual data.