AICVFeb 12

Beyond Pixels: Vector-to-Graph Transformation for Reliable Schematic Auditing

arXiv:2602.11678v1h-index: 1Has Code
Originality Highly original
AI Analysis

This addresses the limitation of pixel-based methods for reliable schematic auditing in engineering domains, offering a structure-aware approach for practical deployment.

The paper tackles the problem of structural blindness in Multimodal Large Language Models (MLLMs) when processing engineering schematics, proposing a Vector-to-Graph (V2G) pipeline that converts CAD diagrams into property graphs to make structural dependencies explicit, resulting in large accuracy gains on a diagnostic benchmark for electrical compliance checks while MLLMs remain near chance level.

Multimodal Large Language Models (MLLMs) have shown remarkable progress in visual understanding, yet they suffer from a critical limitation: structural blindness. Even state-of-the-art models fail to capture topology and symbolic logic in engineering schematics, as their pixel-driven paradigm discards the explicit vector-defined relations needed for reasoning. To overcome this, we propose a Vector-to-Graph (V2G) pipeline that converts CAD diagrams into property graphs where nodes represent components and edges encode connectivity, making structural dependencies explicit and machine-auditable. On a diagnostic benchmark of electrical compliance checks, V2G yields large accuracy gains across all error categories, while leading MLLMs remain near chance level. These results highlight the systemic inadequacy of pixel-based methods and demonstrate that structure-aware representations provide a reliable path toward practical deployment of multimodal AI in engineering domains. To facilitate further research, we release our benchmark and implementation at https://github.com/gm-embodied/V2G-Audit.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes