LGMay 24, 2025

VISTA: Vision-Language Inference for Training-Free Stock Time-Series Analysis

arXiv:2505.18570v34 citationsh-index: 5
Originality Incremental advance
AI Analysis

This addresses stock forecasting for financial analysts by offering a novel, zero-shot approach that leverages visual and textual data, though it is incremental as it applies existing VLMs to a new domain.

The paper tackles stock price prediction by introducing VISTA, a training-free framework that uses Vision-Language Models with multi-modal prompts to forecast future prices, achieving up to 89.83% improvement over baselines like ARIMA and text-only methods.

Stock price prediction remains a complex and high-stakes task in financial analysis, traditionally addressed using statistical models or, more recently, language models. In this work, we introduce VISTA (Vision-Language Inference for Stock Time-series Analysis), a novel, training-free framework that leverages Vision-Language Models (VLMs) for multi-modal stock forecasting. VISTA prompts a VLM with both textual representations of historical stock prices and their corresponding line charts to predict future price values. By combining numerical and visual modalities in a zero-shot setting and using carefully designed chain-of-thought prompts, VISTA captures complementary patterns that unimodal approaches often miss. We benchmark VISTA against standard baselines, including ARIMA and text-only LLM-based prompting methods. Experimental results show that VISTA outperforms these baselines by up to 89.83%, demonstrating the effectiveness of multi-modal inference for stock time-series analysis and highlighting the potential of VLMs in financial forecasting tasks without requiring task-specific training.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes