LGAIFeb 2

CAPS: Unifying Attention, Recurrence, and Alignment in Transformer-based Time Series Forecasting

arXiv:2602.02729v11 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the challenge of temporal structure entanglement in transformer-based forecasting, offering an incremental improvement for time series analysis.

The paper tackles the problem of disentangling global trends, local shocks, and seasonal patterns in time series forecasting by proposing CAPS, a structured attention mechanism that unifies attention, recurrence, and alignment. It demonstrates competitive performance against seven strong baselines with linear complexity.

This paper presents $\textbf{CAPS}$ (Clock-weighted Aggregation with Prefix-products and Softmax), a structured attention mechanism for time series forecasting that decouples three distinct temporal structures: global trends, local shocks, and seasonal patterns. Standard softmax attention entangles these through global normalization, while recent recurrent models sacrifice long-term, order-independent selection for order-dependent causal structure. CAPS combines SO(2) rotations for phase alignment with three additive gating paths -- Riemann softmax, prefix-product gates, and a Clock baseline -- within a single attention layer. We introduce the Clock mechanism, a learned temporal weighting that modulates these paths through a shared notion of temporal importance. Experiments on long- and short-term forecasting benchmarks surpass vanilla softmax and linear attention mechanisms and demonstrate competitive performance against seven strong baselines with linear complexity. Our code implementation is available at https://github.com/vireshpati/CAPS-Attention.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes