Reasoning and Tools for Human-Level Forecasting
This work addresses the challenge of enabling language models to perform human-like reasoning for real-world decision-making, representing a novel method for a known bottleneck rather than an incremental improvement.
The paper tackles the problem of distinguishing genuine reasoning from pattern memorization in language models by focusing on forecasting tasks where answers are not in training data, and demonstrates that their RTF framework, using reasoning-and-acting agents with tools, can outperform human predictions on competitive forecasting platforms.
Language models (LMs) trained on web-scale datasets are largely successful due to their ability to memorize large amounts of training data, even if only present in a few examples. These capabilities are often desirable in evaluation on tasks such as question answering but raise questions about whether these models can exhibit genuine reasoning or succeed only at mimicking patterns from the training data. This distinction is particularly salient in forecasting tasks, where the answer is not present in the training data, and the model must reason to make logical deductions. We present Reasoning and Tools for Forecasting (RTF), a framework of reasoning-and-acting (ReAct) agents that can dynamically retrieve updated information and run numerical simulation with equipped tools. We evaluate our model with questions from competitive forecasting platforms and demonstrate that our method is competitive with and can outperform human predictions. This suggests that LMs, with the right tools, can indeed think and adapt like humans, offering valuable insights for real-world decision-making.