CL AIMay 16, 2024

FinTextQA: A Dataset for Long-form Financial Question Answering

Jian Chen, Peilin Zhou, Yining Hua, Yingxin Loh, Kehui Chen, Ziyuan Li, Bing Zhu, Junwei Liang

arXiv:2405.09980v122.767 citationsh-index: 19ACL

Originality Incremental advance

AI Analysis

This work addresses the need for better evaluation tools for financial QA systems, though it is incremental as it builds on existing RAG methods.

The authors tackled the lack of diverse and complex datasets for financial question answering by introducing FinTextQA, a dataset of 1,262 QA pairs, and found that a specific RAG-based system configuration with Baichuan2-7B performed competitively with GPT-3.5-turbo in accuracy.

Accurate evaluation of financial question answering (QA) systems necessitates a comprehensive dataset encompassing diverse question types and contexts. However, current financial QA datasets lack scope diversity and question complexity. This work introduces FinTextQA, a novel dataset for long-form question answering (LFQA) in finance. FinTextQA comprises 1,262 high-quality, source-attributed QA pairs extracted and selected from finance textbooks and government agency websites.Moreover, we developed a Retrieval-Augmented Generation (RAG)-based LFQA system, comprising an embedder, retriever, reranker, and generator. A multi-faceted evaluation approach, including human ranking, automatic metrics, and GPT-4 scoring, was employed to benchmark the performance of different LFQA system configurations under heightened noisy conditions. The results indicate that: (1) Among all compared generators, Baichuan2-7B competes closely with GPT-3.5-turbo in accuracy score; (2) The most effective system configuration on our dataset involved setting the embedder, retriever, reranker, and generator as Ada2, Automated Merged Retrieval, Bge-Reranker-Base, and Baichuan2-7B, respectively; (3) models are less susceptible to noise after the length of contexts reaching a specific threshold.

View on arXiv PDF

Similar