Navigating Tomorrow: Reliably Assessing Large Language Models Performance on Future Event Prediction
It addresses the problem of predicting future events for applications in fields like finance and disaster management, but is incremental as it applies existing LLMs to a new domain.
This study evaluated the performance of several large language models (LLMs) on future event prediction tasks, assessing them across scenarios like affirmative vs. likelihood questioning, reasoning, and counterfactual analysis using a dataset of news articles before and after the models' training cutoff dates.
Predicting future events is an important activity with applications across multiple fields and domains. For example, the capacity to foresee stock market trends, natural disasters, business developments, or political events can facilitate early preventive measures and uncover new opportunities. Multiple diverse computational methods for attempting future predictions, including predictive analysis, time series forecasting, and simulations have been proposed. This study evaluates the performance of several large language models (LLMs) in supporting future prediction tasks, an under-explored domain. We assess the models across three scenarios: Affirmative vs. Likelihood questioning, Reasoning, and Counterfactual analysis. For this, we create a dataset1 by finding and categorizing news articles based on entity type and its popularity. We gather news articles before and after the LLMs training cutoff date in order to thoroughly test and compare model performance. Our research highlights LLMs potential and limitations in predictive modeling, providing a foundation for future improvements.