LGAIMay 29, 2025

Multi-Modal View Enhanced Large Vision Models for Long-Term Time Series Forecasting

arXiv:2505.24003v28 citationsh-index: 26Has Code
Originality Incremental advance
AI Analysis

This work addresses long-term time series forecasting for domains like finance or climate by improving accuracy through multi-modal integration, though it is incremental as it builds on existing decomposition and multi-view approaches.

The paper tackles long-term time series forecasting by transforming time series into multi-modal views (images and texts) to leverage pre-trained large vision models, but identifies an inductive bias toward forecasting periods in existing methods. It proposes DMMV, a decomposition-based framework that integrates these views, achieving the best MSE on 6 out of 8 benchmark datasets and outperforming 14 SOTA models.

Time series, typically represented as numerical sequences, can also be transformed into images and texts, offering multi-modal views (MMVs) of the same underlying signal. These MMVs can reveal complementary patterns and enable the use of powerful pre-trained large models, such as large vision models (LVMs), for long-term time series forecasting (LTSF). However, as we identified in this work, the state-of-the-art (SOTA) LVM-based forecaster poses an inductive bias towards "forecasting periods". To harness this bias, we propose DMMV, a novel decomposition-based multi-modal view framework that leverages trend-seasonal decomposition and a novel backcast-residual based adaptive decomposition to integrate MMVs for LTSF. Comparative evaluations against 14 SOTA models across diverse datasets show that DMMV outperforms single-view and existing multi-modal baselines, achieving the best mean squared error (MSE) on 6 out of 8 benchmark datasets. The code for this paper is available at: https://github.com/D2I-Group/dmmv.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes