AICVMAAug 21, 2025

See it. Say it. Sorted: Agentic System for Compositional Diagram Generation

arXiv:2508.15222v2h-index: 2Has Code
Originality Incremental advance
AI Analysis

This addresses the problem of generating precise, compositional diagrams from rough sketches for users in fields like documentation or design, though it is incremental as it builds on existing models with a novel system design.

The paper tackles sketch-to-diagram generation by introducing a training-free agentic system that couples Vision-Language Models and Large Language Models to produce editable SVG programs, achieving more faithful reconstruction of layout and structure than frontier closed-source models like GPT-5 and Gemini-2.5-Pro on 10 flowchart sketches.

We study sketch-to-diagram generation: converting rough hand sketches into precise, compositional diagrams. Diffusion models excel at photorealism but struggle with the spatial precision, alignment, and symbolic structure required for flowcharts. We introduce See it. Say it. Sorted., a training-free agentic system that couples a Vision-Language Model (VLM) with Large Language Models (LLMs) to produce editable Scalable Vector Graphics (SVG) programs. The system runs an iterative loop in which a Critic VLM proposes a small set of qualitative, relational edits; multiple candidate LLMs synthesize SVG updates with diverse strategies (conservative->aggressive, alternative, focused); and a Judge VLM selects the best candidate, ensuring stable improvement. This design prioritizes qualitative reasoning over brittle numerical estimates, preserves global constraints (e.g., alignment, connectivity), and naturally supports human-in-the-loop corrections. On 10 sketches derived from flowcharts in published papers, our method more faithfully reconstructs layout and structure than two frontier closed-source image generation LLMs (GPT-5 and Gemini-2.5-Pro), accurately composing primitives (e.g., multi-headed arrows) without inserting unwanted text. Because outputs are programmatic SVGs, the approach is readily extensible to presentation tools (e.g., PowerPoint) via APIs and can be specialized with improved prompts and task-specific tools. The codebase is open-sourced at https://github.com/hantaoZhangrichard/see_it_say_it_sorted.git.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes