AIApr 3

CharTool: Tool-Integrated Visual Reasoning for Chart Understanding

arXiv:2604.0279488.0
AI Analysis

This addresses the problem of accurate chart understanding for researchers and professionals in fields like science and finance, representing an incremental advance through tool integration.

The paper tackled the challenge of chart reasoning for multimodal large language models by proposing CharTool, which integrates external tools for visual perception and numerical computation, resulting in performance improvements such as +8.0% on CharXiv and +9.78% on ChartQAPro over baselines.

Charts are ubiquitous in scientific and financial literature for presenting structured data. However, chart reasoning remains challenging for multimodal large language models (MLLMs) due to the lack of high-quality training data, as well as the need for fine-grained visual grounding and precise numerical computation. To address these challenges, we first propose DuoChart, a scalable dual-source data pipeline that combines synthesized charts with real-world charts to construct diverse, high-quality chart training data. We then introduce CharTool, which equips MLLMs with external tools, including image cropping for localized visual perception and code-based computation for accurate numerical reasoning. Through agentic reinforcement learning on DuoChart, CharTool learns tool-integrated reasoning grounded in chart content. Extensive experiments on six chart benchmarks show that our method consistently improves over strong MLLM baselines across model scales. Notably, CharTool-7B outperforms the base model by **+8.0%** on CharXiv (Reasoning) and **+9.78%** on ChartQAPro, while achieving competitive performance with substantially larger or proprietary models. Moreover, CharTool demonstrates positive generalization to out-of-domain visual math reasoning benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes