LGAISep 5, 2025

VARMA-Enhanced Transformer for Time Series Forecasting

arXiv:2509.04782v11 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses the challenge of capturing fine-grained local dependencies in time series forecasting for applications in fields like finance or climate, though it is incremental as it builds on existing Transformer and VARMA concepts.

The paper tackled the problem of time series forecasting by proposing VARMAformer, a model that integrates classical VARMA statistical principles into a cross-attention-only Transformer, and demonstrated consistent outperformance over state-of-the-art methods in experiments on benchmark datasets.

Transformer-based models have significantly advanced time series forecasting. Recent work, like the Cross-Attention-only Time Series transformer (CATS), shows that removing self-attention can make the model more accurate and efficient. However, these streamlined architectures may overlook the fine-grained, local temporal dependencies effectively captured by classical statistical models like Vector AutoRegressive Moving Average model (VARMA). To address this gap, we propose VARMAformer, a novel architecture that synergizes the efficiency of a cross-attention-only framework with the principles of classical time series analysis. Our model introduces two key innovations: (1) a dedicated VARMA-inspired Feature Extractor (VFE) that explicitly models autoregressive (AR) and moving-average (MA) patterns at the patch level, and (2) a VARMA-Enhanced Attention (VE-atten) mechanism that employs a temporal gate to make queries more context-aware. By fusing these classical insights into a modern backbone, VARMAformer captures both global, long-range dependencies and local, statistical structures. Through extensive experiments on widely-used benchmark datasets, we demonstrate that our model consistently outperforms existing state-of-the-art methods. Our work validates the significant benefit of integrating classical statistical insights into modern deep learning frameworks for time series forecasting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes