CL AI CVAug 11, 2025

InterChart: Benchmarking Visual Reasoning Across Decomposed and Distributed Chart Information

Anirudh Iyengar Kaniyar Narayana Iyengar, Srija Mukhopadhyay, Adnan Qidwai, Shubhankar Singh, Dan Roth, Vivek Gupta

arXiv:2508.07630v112.04 citationsh-index: 7IJCNLP-AACL

Originality Incremental advance

AI Analysis

This addresses the problem of limited multimodal reasoning in real-world applications like scientific reporting and financial analysis, though it is incremental as it builds on prior benchmarks by focusing on cross-chart tasks.

The paper introduces InterChart, a benchmark for evaluating vision-language models on reasoning across multiple related charts, revealing that models show steep accuracy declines as chart complexity increases, with better performance when charts are decomposed into simpler units.

We introduce InterChart, a diagnostic benchmark that evaluates how well vision-language models (VLMs) reason across multiple related charts, a task central to real-world applications such as scientific reporting, financial analysis, and public policy dashboards. Unlike prior benchmarks focusing on isolated, visually uniform charts, InterChart challenges models with diverse question types ranging from entity inference and trend correlation to numerical estimation and abstract multi-step reasoning grounded in 2-3 thematically or structurally related charts. We organize the benchmark into three tiers of increasing difficulty: (1) factual reasoning over individual charts, (2) integrative analysis across synthetically aligned chart sets, and (3) semantic inference over visually complex, real-world chart pairs. Our evaluation of state-of-the-art open and closed-source VLMs reveals consistent and steep accuracy declines as chart complexity increases. We find that models perform better when we decompose multi-entity charts into simpler visual units, underscoring their struggles with cross-chart integration. By exposing these systematic limitations, InterChart provides a rigorous framework for advancing multimodal reasoning in complex, multi-visual environments.

View on arXiv PDF

Similar