CVApr 5, 2025

Evaluating Graphical Perception with Multimodal LLMs

arXiv:2504.04221v1h-index: 1PacificVis
Originality Synthesis-oriented
AI Analysis

This work addresses an underexplored area in data visualization for AI researchers, though it is incremental as it applies existing methods to a new domain.

The paper tackles the problem of how multimodal large language models (MLLMs) perform on graphical perception tasks, such as regressing values in charts, by reproducing a classic 1984 experiment and comparing results to human performance. It finds that MLLMs outperform humans in some cases but not in others, with specific results detailed across experiments.

Multimodal Large Language Models (MLLMs) have remarkably progressed in analyzing and understanding images. Despite these advancements, accurately regressing values in charts remains an underexplored area for MLLMs. For visualization, how do MLLMs perform when applied to graphical perception tasks? Our paper investigates this question by reproducing Cleveland and McGill's seminal 1984 experiment and comparing it against human task performance. Our study primarily evaluates fine-tuned and pretrained models and zero-shot prompting to determine if they closely match human graphical perception. Our findings highlight that MLLMs outperform human task performance in some cases but not in others. We highlight the results of all experiments to foster an understanding of where MLLMs succeed and fail when applied to data visualization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes