LGMay 1

Generating Statistical Charts with Validation-Driven LLM Workflows

arXiv:2605.0080047.7
Predicted impact top 49% in LG · last 90 daysOriginality Incremental advance
AI Analysis

For researchers and practitioners needing reliable chart generation and diagnostic benchmarks for multimodal reasoning, this work provides a validated workflow and a new dataset that reveals current LLM limitations.

The paper presents a structured LLM workflow for generating diverse, readable statistical charts from tabular data, producing 1,500 charts from 74 datasets with 30,003 question-answer pairs. Evaluation of 16 multimodal LLMs shows that chart-syntax questions are nearly saturated, while value extraction, comparison, and reasoning remain challenging.

Generating diverse, readable statistical charts from tabular data remains challenging for LLMs, as many failures become apparent after rendering and are not detectable from data or code alone. Existing chart datasets also rarely provide fully aligned artifacts, such as executable code, dataset context, and question-answer pairs. We present a structured LLM-based workflow that decomposes chart generation into dataset screening, plot proposal, code synthesis, rendering, validation-driven refinement, description generation, and question-answer generation. By incorporating rendered-output validation, the workflow addresses visualization-specific failure modes such as readability and semantic mismatch. It treats chart generation as an inspectable process rather than a one-shot prompt-to-code task, retaining each chart with its code, dataset context, description, and question-answer pairs. Applied to UCI datasets, the workflow produces 1,500 charts from 74 datasets, spanning 24 chart families and paired with 30,003 question-answer pairs. We evaluate 16 multimodal LLMs (MLLMs) on these chart-question pairs. The results show that chart-syntax questions are nearly saturated, while value extraction, comparison, and reasoning remain more challenging, illustrating the workflow's utility for diagnostic studies of chart-grounded multimodal reasoning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes