FISCAL: Financial Synthetic Claim-document Augmented Learning for Efficient Fact-Checking
This addresses the need for efficient and reliable fact-checking in financial AI applications, representing a strong specific gain rather than a foundational advancement.
The paper tackles the problem of factual unreliability and computational inefficiency in financial large language models by proposing FISCAL, a framework for generating synthetic data to train MiniCheck-FISCAL, a lightweight verifier that outperforms baselines, rivals GPT-4o and Claude-3.5 on external datasets, and approaches the accuracy of much larger systems (20x).
Financial applications of large language models (LLMs) require factual reliability and computational efficiency, yet current systems often hallucinate details and depend on prohibitively large models. We propose FISCAL (Financial Synthetic Claim-Document Augmented Learning), a modular framework for generating synthetic data tailored to financial fact-checking. Using FISCAL, we generate a dataset called FISCAL-data and use it to train MiniCheck-FISCAL, a lightweight verifier for numerical financial claims. MiniCheck-FISCAL outperforms its baseline, surpasses GPT-3.5 Turbo and other open-source peers of similar size, and approaches the accuracy of much larger systems (20x), such as Mixtral-8x22B and Command R+. On external datasets FinDVer and Fin-Fact, it rivals GPT-4o and Claude-3.5 while outperforming Gemini-1.5 Flash. These results show that domain-specific synthetic data, combined with efficient fine-tuning, enables compact models to achieve state-of-the-art accuracy, robustness, and scalability for practical financial AI. The dataset and scripts are available in the project repository (link provided in the paper).