CLAIMay 21, 2025

ChartCards: A Chart-Metadata Generation Framework for Multi-Task Chart Understanding

arXiv:2505.15046v34 citationsh-index: 12Has Code
Originality Incremental advance
AI Analysis

This addresses the problem of costly fine-tuning for chart understanding tasks in multi-modal AI, offering a scalable dataset and framework, though it is incremental as it builds on existing MLLM capabilities.

The paper tackles the high data collection and training costs for multi-task chart understanding by proposing ChartCards, a unified chart-metadata generation framework, and constructs MetaChart, a dataset with 10,862 data tables and 85K charts, leading to an average 5% performance improvement across tasks, with up to 28% gains in specific tasks.

The emergence of Multi-modal Large Language Models (MLLMs) presents new opportunities for chart understanding. However, due to the fine-grained nature of these tasks, applying MLLMs typically requires large, high-quality datasets for task-specific fine-tuning, leading to high data collection and training costs. To address this, we propose ChartCards, a unified chart-metadata generation framework for multi-task chart understanding. ChartCards systematically synthesizes various chart information, including data tables, visualization code, visual elements, and multi-dimensional semantic captions. By structuring this information into organized metadata, ChartCards enables a single chart to support multiple downstream tasks, such as text-to-chart retrieval, chart summarization, chart-to-table conversion, chart description, and chart question answering. Using ChartCards, we further construct MetaChart, a large-scale high-quality dataset containing 10,862 data tables, 85K charts, and 170 K high-quality chart captions. We validate the dataset through qualitative crowdsourcing evaluations and quantitative fine-tuning experiments across various chart understanding tasks. Fine-tuning six different models on MetaChart resulted in an average performance improvement of 5% across all tasks. The most notable improvements are seen in text-to-chart retrieval and chart-to-table tasks, with Long-CLIP and Llama 3.2-11B achieving improvements of 17% and 28%, respectively.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes