What If TSF: A Benchmark for Reframing Forecasting as Scenario-Guided Multimodal Forecasting
This addresses the need for better benchmarks in time series forecasting to assess multimodal approaches, though it is incremental as it focuses on evaluation rather than a new method.
The paper tackles the problem of evaluating multimodal forecasting models by introducing the What If TSF (WIT) benchmark, which provides expert-crafted scenarios to test if models can condition forecasts on contextual text, offering a rigorous testbed for scenario-guided forecasting.
Time series forecasting is critical to real-world decision making, yet most existing approaches remain unimodal and rely on extrapolating historical patterns. While recent progress in large language models (LLMs) highlights the potential for multimodal forecasting, existing benchmarks largely provide retrospective or misaligned raw context, making it unclear whether such models meaningfully leverage textual inputs. In practice, human experts incorporate what-if scenarios with historical evidence, often producing distinct forecasts from the same observations under different scenarios. Inspired by this, we introduce What If TSF (WIT), a multimodal forecasting benchmark designed to evaluate whether models can condition their forecasts on contextual text, especially future scenarios. By providing expert-crafted plausible or counterfactual scenarios, WIT offers a rigorous testbed for scenario-guided multimodal forecasting. The benchmark is available at https://github.com/jinkwan1115/WhatIfTSF.