FinStat2SQL: A Text2SQL Pipeline for Financial Statement Analysis
It provides a scalable, cost-efficient solution for AI-powered querying of financial data, specifically targeting Vietnamese enterprises with local standards like VAS.
The paper tackles the challenge of text-to-SQL for financial statement analysis by developing FinStat2SQL, a lightweight pipeline that combines large and small language models, achieving 61.33% accuracy with sub-4-second response times on consumer hardware and outperforming GPT-4o-mini.
Despite the advancements of large language models, text2sql still faces many challenges, particularly with complex and domain-specific queries. In finance, database designs and financial reporting layouts vary widely between financial entities and countries, making text2sql even more challenging. We present FinStat2SQL, a lightweight text2sql pipeline enabling natural language queries over financial statements. Tailored to local standards like VAS, it combines large and small language models in a multi-agent setup for entity extraction, SQL generation, and self-correction. We build a domain-specific database and evaluate models on a synthetic QA dataset. A fine-tuned 7B model achieves 61.33\% accuracy with sub-4-second response times on consumer hardware, outperforming GPT-4o-mini. FinStat2SQL offers a scalable, cost-efficient solution for financial analysis, making AI-powered querying accessible to Vietnamese enterprises.