CLCVMay 15

VCG-Bench: Towards A Unified Visual-Centric Benchmark for Structured Generation and Editing

arXiv:2605.1567789.6
Predicted impact top 34% in CL · last 90 daysOriginality Highly original
AI Analysis

For researchers and practitioners needing precise diagram generation and editing in professional workflows, this work introduces a new paradigm and benchmark to evaluate VLMs' structured visual reasoning.

The paper addresses the lack of structured, controllable diagrammatic capabilities in VLMs by proposing a Diagram-as-Code paradigm using mxGraph XML. VCG-Bench, a benchmark with 1,449 diagrams across 6 domains, shows that current SOTA VLMs struggle with structured fidelity and instruction compliance.

Despite the rapid advancements in Vision-Language Models (VLMs), a critical gap remains in their ability to handle structured, controllable diagrammatic tasks essential for professional workflows. Existing methods predominantly rely on pixel-based synthesis, which operates in probabilistic pixel spaces and is inherently limited in editability and fidelity. Instead, we propose a new Diagram-as-Code paradigm with symbolic logic that leverages mxGraph Extensible Markup Language (XML) for precise diagram generation and editing. We present VCG-Bench, a unified benchmark for visual-centric \texttt{mxGraph} tasks. VCG-Bench comprises: (1) a taxonomized dataset of 1,449 diverse diagrams spanning 6 domains and 15 sub-domains, (2) a paradigm definition that integrates Generation (Vision-to-Code) and Editability (Code-to-Code), (3) a Tailored Evaluation Protocol employing multi-dimensional metrics such as \texttt{mxGraph} Execution Success Rate, Style Consistency Score (SCS), etc. Experimental results highlight the challenges faced by current State-of-the-Art (SOTA) VLMs in structured fidelity and instruction compliance, reflecting their vision and reasoning capabilities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes