AICVMay 28

PhyDrawGen: Physically Grounded Diagram Generation from Natural Language

arXiv:2605.3051248.4h-index: 6
Predicted impact top 74% in AI · last 90 daysOriginality Highly original
AI Analysis

This work addresses the problem of generating physically accurate diagrams from text for students and educators, which is currently a bottleneck for existing generative models.

The paper introduces PhyDrawGen, a neuro-symbolic pipeline that generates physics diagrams from natural language by separating semantic understanding from physical constraint satisfaction. It significantly outperforms GPT-5-image, Gemini 2.5 Flash, and Gemini 3 Pro on a benchmark of 1,449 problems across mechanics, optics, and electromagnetism, achieving robust physical accuracy.

Generating physics diagrams from text requires strict adherence to physical laws. While current generative models produce visually plausible outputs, they systematically hallucinate force vectors, ignore conservation laws, and violate geometric constraints. We present PhyDrawGen, a neuro-symbolic pipeline that decouples semantic scene understanding from physical constraint satisfaction. First, a large language model extracts a typed scene graph from the problem text. A deterministic solver then converts this graph into a Planar Straight-Line Graph (PSLG), encoding force balance, optical paths, and field topologies as exact geometric primitives. Finally, a fine-tuned Qwen-VL model implements a visually grounded propose-verify loop to iteratively correct any constraint violations. Evaluated on a benchmark of 1,449 problems spanning mechanics, optics, and electromagnetism, PhyDrawGen significantly outperforms GPT-5-image, Gemini 2.5 Flash, and Gemini 3 Pro, demonstrating robust physical accuracy even on unusual-object problems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes