CVLGApr 15

SynthPID: P&ID digitization from Topology-Preserving Synthetic Data

arXiv:2604.1651311.8h-index: 1
Predicted impact top 94% in CV · last 90 daysOriginality Incremental advance
AI Analysis

For researchers and practitioners in industrial diagram digitization, SynthPID addresses the critical bottleneck of limited annotated real data by providing a high-quality synthetic alternative.

SynthPID introduces a synthetic P&ID dataset with topology seeded from real drawings, enabling a model trained solely on synthetic data to achieve 63.8% edge mAP on a real benchmark, closing within 8 percentage points of a real-data oracle.

Automating the digitization of Piping and Instrumentation Diagrams (P&IDs) into structured process graphs would unlock significant value in plant operations, yet progress is bottlenecked by a fundamental data problem: engineering drawings are proprietary, and the entire community shares a single public benchmark of just 12 annotated images. Prior attempts at synthetic augmentation have fallen short because template-based generators scatter symbols at random, producing graphs that bear little resemblance to real process plants and, accordingly, yield only approximately 33% edge detection accuracy under synth-only training. We argue the failure is structural rather than visual and address it by introducing SynthPID, a corpus of 665 synthetic P&IDs whose pipe topology is seeded directly from real drawings. Paired with a patch-based Relationformer adapted for high-resolution diagrams, a model trained on SynthPID alone achieves 63.8 +/- 3.1% edge mAP on PID2Graph OPEN100 without seeing a single real P&ID during training, closing within 8 pp of the real-data oracle. These gains hold up under a controlled comparison against the template-based regime, confirming that generation quality drives performance rather than model choice. A scaling study reveals that gains flatten beyond roughly 400 synthetic images, pointing to seed diversity as the binding constraint.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes