PriceSeer: Evaluating Large Language Models in Real-Time Stock Prediction
This work addresses the need for robust evaluation of LLMs in financial forecasting for investors and researchers, though it is incremental as it builds on existing LLM capabilities with a new benchmark.
The authors tackled the problem of evaluating large language models (LLMs) in real-time stock prediction by introducing PriceSeer, a live benchmark with 110 U.S. stocks and 249 data points each, showing LLMs' potential in generating investment strategies but noting suboptimal performance in long-term predictions.
Stock prediction, a subject closely related to people's investment activities in fully dynamic and live environments, has been widely studied. Current large language models (LLMs) have shown remarkable potential in various domains, exhibiting expert-level performance through advanced reasoning and contextual understanding. In this paper, we introduce PriceSeer, a live, dynamic, and data-uncontaminated benchmark specifically designed for LLMs performing stock prediction tasks. Specifically, PriceSeer includes 110 U.S. stocks from 11 industrial sectors, with each containing 249 historical data points. Our benchmark implements both internal and external information expansion, where LLMs receive extra financial indicators, news, and fake news to perform stock price prediction. We evaluate six cutting-edge LLMs under different prediction horizons, demonstrating their potential in generating investment strategies after obtaining accurate price predictions for different sectors. Additionally, we provide analyses of LLMs' suboptimal performance in long-term predictions, including the vulnerability to fake news and specific industries. The code and evaluation data will be open-sourced at https://github.com/BobLiang2113/PriceSeer.