LGAIMar 31, 2025

CITRAS: Covariate-Informed Transformer for Time Series Forecasting

arXiv:2503.24007v36 citationsh-index: 5
Originality Incremental advance
AI Analysis

This addresses a practical problem in time series forecasting for domains like finance or retail by improving accuracy through better use of covariates, though it is incremental as it builds on existing Transformer methods.

The paper tackles the challenge of leveraging covariates with length discrepancies in time series forecasting by proposing CITRAS, a decoder-only Transformer with novel mechanisms like KV Shift and Attention Score Smoothing, which outperforms state-of-the-art models on thirteen real-world benchmarks.

In practical time series forecasting, covariates provide rich contextual information that can potentially enhance the forecast of target variables. Although some covariates extend into the future forecasting horizon (e.g., calendar events, discount schedules), most multivariate models fail to leverage this pivotal insight due to the length discrepancy with target variables. Additionally, capturing the dependency between target variables and covariates is non-trivial, as models must precisely reflect the local impact of covariates while also capturing global cross-variate dependencies. To overcome these challenges, we propose CITRAS, a decoder-only Transformer that flexibly leverages multiple targets, past covariates, and future covariates. While preserving strong autoregressive capabilities, CITRAS introduces two novel mechanisms in patch-wise cross-variate attention: Key-Value (KV) Shift and Attention Score Smoothing. KV Shift seamlessly incorporates future covariates into the forecasting of target variables based on their concurrent dependencies. Additionally, Attention Score Smoothing refines locally accurate patch-wise cross-variate dependencies into global variate-level dependencies by smoothing the past series of attention scores. Experimentally, CITRAS outperforms state-of-the-art models on thirteen real-world benchmarks from both covariate-informed and multivariate settings, demonstrating its versatile ability to leverage cross-variate and cross-time dependencies for improved forecasting accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes