CVJul 20, 2025

FinChart-Bench: Benchmarking Financial Chart Comprehension in Vision-Language Models

arXiv:2507.14823v13 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This work addresses the underexplored domain of financial charts for vision-language models, providing a new benchmark to evaluate and improve their capabilities, though it is incremental as it focuses on benchmarking rather than proposing a new method.

The authors tackled the problem of financial chart comprehension in vision-language models by introducing FinChart-Bench, a benchmark with 1,200 financial chart images and 7,016 questions, and found that current models show significant limitations, such as performance degradation in upgraded models and struggles with spatial reasoning.

Large vision-language models (LVLMs) have made significant progress in chart understanding. However, financial charts, characterized by complex temporal structures and domain-specific terminology, remain notably underexplored. We introduce FinChart-Bench, the first benchmark specifically focused on real-world financial charts. FinChart-Bench comprises 1,200 financial chart images collected from 2015 to 2024, each annotated with True/False (TF), Multiple Choice (MC), and Question Answering (QA) questions, totaling 7,016 questions. We conduct a comprehensive evaluation of 25 state-of-the-art LVLMs on FinChart-Bench. Our evaluation reveals critical insights: (1) the performance gap between open-source and closed-source models is narrowing, (2) performance degradation occurs in upgraded models within families, (3) many models struggle with instruction following, (4) both advanced models show significant limitations in spatial reasoning abilities, and (5) current LVLMs are not reliable enough to serve as automated evaluators. These findings highlight important limitations in current LVLM capabilities for financial chart understanding. The FinChart-Bench dataset is available at https://huggingface.co/datasets/Tizzzzy/FinChart-Bench.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes