LG AI MLOct 10, 2025

Why Do Transformers Fail to Forecast Time Series In-Context?

Yufa Zhou, Yixiao Wang, Surbhi Goel, Anru R. Zhang

arXiv:2510.09776v15 citationsh-index: 6

Originality Incremental advance

AI Analysis

This work addresses a fundamental limitation in machine learning for time series forecasting, highlighting theoretical gaps and encouraging reevaluation of complex architectures, though it is incremental in providing specific theoretical insights.

The paper tackles the problem of why Transformers underperform simple linear models in time series forecasting, showing theoretically that linear self-attention cannot beat linear models and asymptotically recovers the optimal predictor, with predictions collapsing to the mean under chain-of-thought inference.

Time series forecasting (TSF) remains a challenging and largely unsolved problem in machine learning, despite significant recent efforts leveraging Large Language Models (LLMs), which predominantly rely on Transformer architectures. Empirical evidence consistently shows that even powerful Transformers often fail to outperform much simpler models, e.g., linear models, on TSF tasks; however, a rigorous theoretical understanding of this phenomenon remains limited. In this paper, we provide a theoretical analysis of Transformers' limitations for TSF through the lens of In-Context Learning (ICL) theory. Specifically, under AR($p$) data, we establish that: (1) Linear Self-Attention (LSA) models $\textit{cannot}$ achieve lower expected MSE than classical linear models for in-context forecasting; (2) as the context length approaches to infinity, LSA asymptotically recovers the optimal linear predictor; and (3) under Chain-of-Thought (CoT) style inference, predictions collapse to the mean exponentially. We empirically validate these findings through carefully designed experiments. Our theory not only sheds light on several previously underexplored phenomena but also offers practical insights for designing more effective forecasting architectures. We hope our work encourages the broader research community to revisit the fundamental theoretical limitations of TSF and to critically evaluate the direct application of increasingly sophisticated architectures without deeper scrutiny.

View on arXiv PDF

Similar