AIIRNov 27, 2025

Structured Extraction from Business Process Diagrams Using Vision-Language Models

arXiv:2511.22448v13 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of analyzing business process diagrams in scenarios where original source files are unavailable, though it is incremental as it builds on existing VLMs and OCR methods.

The paper tackled the problem of extracting structured JSON representations from BPMN diagram images without requiring source files, using Vision-Language Models and OCR for text enrichment, resulting in performance improvements in several models when OCR was incorporated.

Business Process Model and Notation (BPMN) is a widely adopted standard for representing complex business workflows. While BPMN diagrams are often exchanged as visual images, existing methods primarily rely on XML representations for computational analysis. In this work, we present a pipeline that leverages Vision-Language Models (VLMs) to extract structured JSON representations of BPMN diagrams directly from images, without requiring source model files or textual annotations. We also incorporate optical character recognition (OCR) for textual enrichment and evaluate the generated element lists against ground truth data derived from the source XML files. Our approach enables robust component extraction in scenarios where original source files are unavailable. We benchmark multiple VLMs and observe performance improvements in several models when OCR is used for text enrichment. In addition, we conducted extensive statistical analyses of OCR-based enrichment methods and prompt ablation studies, providing a clearer understanding of their impact on model performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes