From Images to Signals: Are Large Vision Models Useful for Time Series Analysis?
This work addresses the problem of evaluating LVMs for time series tasks, providing foundational insights for researchers in time series analysis and multimodal AI, though it is incremental in assessing an emerging direction.
The study investigated whether Large Vision Models (LVMs) are useful for time series analysis, finding they are effective for classification tasks but face challenges in forecasting, with limitations such as bias toward forecasting periods and restricted use of long look-back windows.
Transformer-based models have gained increasing attention in time series research, driving interest in Large Language Models (LLMs) and foundation models for time series analysis. As the field moves toward multi-modality, Large Vision Models (LVMs) are emerging as a promising direction. In the past, the effectiveness of Transformer and LLMs in time series has been debated. When it comes to LVMs, a similar question arises: are LVMs truely useful for time series analysis? To address it, we design and conduct the first principled study involving 4 LVMs, 8 imaging methods, 18 datasets and 26 baselines across both high-level (classification) and low-level (forecasting) tasks, with extensive ablation analysis. Our findings indicate LVMs are indeed useful for time series classification but face challenges in forecasting. Although effective, the contemporary best LVM forecasters are limited to specific types of LVMs and imaging methods, exhibit a bias toward forecasting periods, and have limited ability to utilize long look-back windows. We hope our findings could serve as a cornerstone for future research on LVM- and multimodal-based solutions to different time series tasks.