CVAIJan 30

Procedural Knowledge Extraction from Industrial Troubleshooting Guides Using Vision Language Models

arXiv:2601.22754v11 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the labor-intensive and error-prone manual extraction process for shop-floor personnel in diagnosing equipment issues, representing an incremental improvement by applying existing VLMs to a new domain-specific data type.

The paper tackled the problem of automating the extraction of structured procedural knowledge from industrial troubleshooting guides, which are flowchart-like diagrams, by evaluating two Vision Language Models with different prompting strategies. The results revealed model-specific trade-offs between layout sensitivity and semantic robustness, providing insights for practical deployment decisions.

Industrial troubleshooting guides encode diagnostic procedures in flowchart-like diagrams where spatial layout and technical language jointly convey meaning. To integrate this knowledge into operator support systems, which assist shop-floor personnel in diagnosing and resolving equipment issues, the information must first be extracted and structured for machine interpretation. However, when performed manually, this extraction is labor-intensive and error-prone. Vision Language Models offer potential to automate this process by jointly interpreting visual and textual meaning, yet their performance on such guides remains underexplored. This paper evaluates two VLMs on extracting structured knowledge, comparing two prompting strategies: standard instruction-guided versus an augmented approach that cues troubleshooting layout patterns. Results reveal model-specific trade-offs between layout sensitivity and semantic robustness, informing practical deployment decisions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes