CVFeb 23

Do Large Language Models Understand Data Visualization Principles?

Martin Sinnona, Valentin Bonas, Viviana Siless, Emmanuel Iarussi

arXiv:2602.20084v11.5h-index: 12

Originality Incremental advance

AI Analysis

This work addresses the challenge of automating visualization design validation for data scientists and designers, but it is incremental as it builds on prior work on LLMs for chart generation and constraint-based systems.

The paper tackled the problem of whether large language models (LLMs) and vision-language models (VLMs) can reason about and enforce data visualization principles, evaluating them on detection and correction tasks using a dataset of about 2,000 annotated Vega-Lite specifications and over 300 real-world charts, finding that frontier models are more effective at correcting violations than detecting them reliably.

Data visualization principles, derived from decades of research in design and perception, ensure proper visual communication. While prior work has shown that large language models (LLMs) can generate charts or flag misleading figures, it remains unclear whether they and their vision-language counterparts (VLMs) can reason about and enforce visualization principles directly. Constraint based systems encode these principles as logical rules for precise automated checks, but translating them into formal specifications demands expert knowledge. This motivates leveraging LLMs and VLMs as principle checkers that can reason about visual design directly, bypassing the need for symbolic rule specification. In this paper, we present the first systematic evaluation of both LLMs and VLMs on their ability to reason about visualization principles, using hard verification ground truth derived from Answer Set Programming (ASP). We compiled a set of visualization principles expressed as natural-language statements and generated a controlled dataset of approximately 2,000 Vega-Lite specifications annotated with explicit principle violations, complemented by over 300 real-world Vega-Lite charts. We evaluated both checking and fixing tasks, assessing how well models detect principle violations and correct flawed chart specifications. Our work highlights both the promise of large (vision-)language models as flexible validators and editors of visualization designs and the persistent gap with symbolic solvers on more nuanced aspects of visual perception. They also reveal an interesting asymmetry: frontier models tend to be more effective at correcting violations than at detecting them reliably.

View on arXiv PDF

Similar