CL PM ST APOct 16, 2023

Towards reducing hallucination in extracting information from financial reports using Large Language Models

Bhaskarjit Sarmah, Tianjie Zhu, Dhagash Mehta, Stefano Pasquali

arXiv:2310.10760v15.219 citationsh-index: 6

Originality Synthesis-oriented

AI Analysis

This addresses the challenge for financial analysts in scaling and improving accuracy in extracting insights from unstructured financial transcripts, though it appears incremental by applying existing techniques to a specific domain.

The authors tackled the problem of extracting information from financial report Q&A sections by using Large Language Models combined with retrieval-augmented generation and metadata to reduce hallucination, achieving high accuracy and efficiency as demonstrated through objective metrics.

For a financial analyst, the question and answer (Q\&A) segment of the company financial report is a crucial piece of information for various analysis and investment decisions. However, extracting valuable insights from the Q\&A section has posed considerable challenges as the conventional methods such as detailed reading and note-taking lack scalability and are susceptible to human errors, and Optical Character Recognition (OCR) and similar techniques encounter difficulties in accurately processing unstructured transcript text, often missing subtle linguistic nuances that drive investor decisions. Here, we demonstrate the utilization of Large Language Models (LLMs) to efficiently and rapidly extract information from earnings report transcripts while ensuring high accuracy transforming the extraction process as well as reducing hallucination by combining retrieval-augmented generation technique as well as metadata. We evaluate the outcomes of various LLMs with and without using our proposed approach based on various objective metrics for evaluating Q\&A systems, and empirically demonstrate superiority of our method.

View on arXiv PDF

Similar