LG SYOct 6, 2025

Benchmarking M-LTSF: Frequency and Noise-Based Evaluation of Multivariate Long Time Series Forecasting Models

Nick Janßen, Melanie Schaller, Bodo Rosenhahn

arXiv:2510.04900v14.12 citationsh-index: 6

Originality Incremental advance

AI Analysis

This work addresses the problem of ambiguous model evaluation in time series forecasting for researchers and practitioners, offering a controlled testbed, but it is incremental as it builds on existing benchmarking methods.

The authors tackled the challenge of evaluating multivariate long-term time series forecasting models by proposing a simulation-based framework with synthetic datasets to systematically test model robustness under controlled noise and frequency conditions, revealing that models degrade when lookback windows miss seasonal patterns and showing specific performance differences, such as S-Mamba and Autoformer excelling on sawtooth patterns while R-Linear and iTransformer favor sinusoidal signals.

Understanding the robustness of deep learning models for multivariate long-term time series forecasting (M-LTSF) remains challenging, as evaluations typically rely on real-world datasets with unknown noise properties. We propose a simulation-based evaluation framework that generates parameterizable synthetic datasets, where each dataset instance corresponds to a different configuration of signal components, noise types, signal-to-noise ratios, and frequency characteristics. These configurable components aim to model real-world multivariate time series data without the ambiguity of unknown noise. This framework enables fine-grained, systematic evaluation of M-LTSF models under controlled and diverse scenarios. We benchmark four representative architectures S-Mamba (state-space), iTransformer (transformer-based), R-Linear (linear), and Autoformer (decomposition-based). Our analysis reveals that all models degrade severely when lookback windows cannot capture complete periods of seasonal patters in the data. S-Mamba and Autoformer perform best on sawtooth patterns, while R-Linear and iTransformer favor sinusoidal signals. White and Brownian noise universally degrade performance with lower signal-to-noise ratio while S-Mamba shows specific trend-noise and iTransformer shows seasonal-noise vulnerability. Further spectral analysis shows that S-Mamba and iTransformer achieve superior frequency reconstruction. This controlled approach, based on our synthetic and principle-driven testbed, offers deeper insights into model-specific strengths and limitations through the aggregation of MSE scores and provides concrete guidance for model selection based on signal characteristics and noise conditions.

View on arXiv PDF

Similar