SciFlow-Bench: Evaluating Structure-Aware Scientific Diagram Generation via Inverse Parsing
This addresses the issue of structurally inaccurate diagram generation for scientific applications, though it is incremental as it focuses on evaluation rather than a new generation method.
The paper tackles the problem of generating structurally correct scientific diagrams from text, introducing SciFlow-Bench as a benchmark that evaluates pixel-based outputs by inverse-parsing images back into graphs, revealing that structural correctness is a key challenge, especially for complex diagrams.
Scientific diagrams convey explicit structural information, yet modern text-to-image models often produce visually plausible but structurally incorrect results. Existing benchmarks either rely on image-centric or subjective metrics insensitive to structure, or evaluate intermediate symbolic representations rather than final rendered images, leaving pixel-based diagram generation underexplored. We introduce SciFlow-Bench, a structure-first benchmark for evaluating scientific diagram generation directly from pixel-level outputs. Built from real scientific PDFs, SciFlow-Bench pairs each source framework figure with a canonical ground-truth graph and evaluates models as black-box image generators under a closed-loop, round-trip protocol that inverse-parses generated diagram images back into structured graphs for comparison. This design enforces evaluation by structural recoverability rather than visual similarity alone, and is enabled by a hierarchical multi-agent system that coordinates planning, perception, and structural reasoning. Experiments show that preserving structural correctness remains a fundamental challenge, particularly for diagrams with complex topology, underscoring the need for structure-aware evaluation.