HCCLJun 19, 2025

Capturing Visualization Design Rationale

arXiv:2506.16571v22 citationsh-index: 37VIS
Originality Synthesis-oriented
AI Analysis

This addresses the problem of limited real-world data for probing visualization design choices, primarily for researchers in data visualization and human-computer interaction, and is incremental as it builds on prior work by focusing on encoding rather than decoding.

The paper tackles the lack of datasets for understanding visualization design rationale by introducing a new dataset derived from student-created literate visualization notebooks, which combine visual artifacts with design exposition, and using LLMs to generate and validate question-answer-rationale triples.

Prior natural language datasets for data visualization have focused on tasks such as visualization literacy assessment, insight generation, and visualization generation from natural language instructions. These studies often rely on controlled setups with purpose-built visualizations and artificially constructed questions. As a result, they tend to prioritize the interpretation of visualizations, focusing on decoding visualizations rather than understanding their encoding. In this paper, we present a new dataset and methodology for probing visualization design rationale through natural language. We leverage a unique source of real-world visualizations and natural language narratives: literate visualization notebooks created by students as part of a data visualization course. These notebooks combine visual artifacts with design exposition, in which students make explicit the rationale behind their design decisions. We also use large language models (LLMs) to generate and categorize question-answer-rationale triples from the narratives and articulations in the notebooks. We then carefully validate the triples and curate a dataset that captures and distills the visualization design choices and corresponding rationales of the students.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes