TS-Arena Technical Report -- A Pre-registered Live Forecasting Platform
This addresses the problem of invalid benchmarking for researchers and practitioners in time series forecasting, though it is incremental as it builds on existing evaluation concerns.
The paper tackles the evaluation crisis in Time Series Foundation Models (TSFMs) caused by information leakage and illegitimate pattern transfer by introducing TS-Arena, a platform that uses pre-registration on live data streams to enforce strict global temporal splits, with a prototype applied in the energy sector.
While Time Series Foundation Models (TSFMs) offer transformative capabilities for forecasting, they simultaneously risk triggering a fundamental evaluation crisis. This crisis is driven by information leakage due to overlapping training and test sets across different models, as well as the illegitimate transfer of global patterns to test data. While the ability to learn shared temporal dynamics represents a primary strength of these models, their evaluation on historical archives often permits the exploitation of observed global shocks, which violates the independence required for valid benchmarking. We introduce TS-Arena, a platform that restores the operational integrity of forecasting by treating the genuinely unknown future as the definitive test environment. By implementing a pre-registration mechanism on live data streams, the platform ensures that evaluation targets remain physically non-existent during inference, thereby enforcing a strict global temporal split. This methodology establishes a moving temporal frontier that prevents historical contamination and provides an authentic assessment of model generalization. Initially applied within the energy sector, TS-Arena provides a sustainable infrastructure for comparing foundation models under real-world constraints. A prototype of the platform is available at https://huggingface.co/spaces/DAG-UPB/TS-Arena.