CLCELGCPDec 14, 2024

SusGen-GPT: A Data-Centric LLM for Financial NLP and Sustainability Report Generation

arXiv:2412.10906v111 citationsh-index: 44Has CodeNAACL
Originality Incremental advance
AI Analysis

This work addresses the need for advanced NLP tools in the financial sector and ESG reporting, offering a domain-specific solution with incremental improvements.

The authors tackled the scarcity of open-source LLMs for finance and ESG by introducing SusGen-30K dataset and TCFD-Bench benchmark, developing SusGen-GPT models that achieve state-of-the-art performance across tasks, trailing GPT-4 by only 2% with significantly fewer parameters.

The rapid growth of the financial sector and the rising focus on Environmental, Social, and Governance (ESG) considerations highlight the need for advanced NLP tools. However, open-source LLMs proficient in both finance and ESG domains remain scarce. To address this gap, we introduce SusGen-30K, a category-balanced dataset comprising seven financial NLP tasks and ESG report generation, and propose TCFD-Bench, a benchmark for evaluating sustainability report generation. Leveraging this dataset, we developed SusGen-GPT, a suite of models achieving state-of-the-art performance across six adapted and two off-the-shelf tasks, trailing GPT-4 by only 2% despite using 7-8B parameters compared to GPT-4's 1,700B. Based on this, we propose the SusGen system, integrated with Retrieval-Augmented Generation (RAG), to assist in sustainability report generation. This work demonstrates the efficiency of our approach, advancing research in finance and ESG.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes