Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? A Study on Several Typical Tasks
This study addresses the problem of assessing LLM capabilities in the financial domain for researchers and practitioners, providing empirical evidence to guide applications and improvements, though it is incremental as it evaluates existing models on new data.
The paper investigates the effectiveness of ChatGPT and GPT-4 as general-purpose solvers for financial text analytics by evaluating their performance on eight benchmark datasets across five task categories, comparing them to state-of-the-art fine-tuned and domain-specific models to report strengths and limitations.
The most recent large language models(LLMs) such as ChatGPT and GPT-4 have shown exceptional capabilities of generalist models, achieving state-of-the-art performance on a wide range of NLP tasks with little or no adaptation. How effective are such models in the financial domain? Understanding this basic question would have a significant impact on many downstream financial analytical tasks. In this paper, we conduct an empirical study and provide experimental evidences of their performance on a wide variety of financial text analytical problems, using eight benchmark datasets from five categories of tasks. We report both the strengths and limitations of the current models by comparing them to the state-of-the-art fine-tuned approaches and the recently released domain-specific pretrained models. We hope our study can help understand the capability of the existing models in the financial domain and facilitate further improvements.