CV AIMay 2, 2025

Automated Parsing of Engineering Drawings for Structured Information Extraction Using a Fine-tuned Document Understanding Transformer

Muhammad Tayyab Khan, Zane Yong, Lequn Chen, Jun Ming Tan, Wenhe Feng, Seung Ki Moon

arXiv:2505.01530v36.25 citationsh-index: 15IEEM

Originality Incremental advance

AI Analysis

This work addresses the slow and labor-intensive manual extraction and unstructured outputs from traditional OCR in high-precision manufacturing, offering an incremental improvement for scalable deployment in precision-driven industries.

The paper tackled the problem of extracting structured information from 2D engineering drawings by proposing a hybrid deep learning framework that integrates an OBB detection model with a transformer-based document parsing model, achieving high precision (e.g., 94.77% for GD&T) and F1 score (97.3%) while reducing hallucinations to 5.23%.

Accurate extraction of key information from 2D engineering drawings is crucial for high-precision manufacturing. Manual extraction is slow and labor-intensive, while traditional Optical Character Recognition (OCR) techniques often struggle with complex layouts and overlapping symbols, resulting in unstructured outputs. To address these challenges, this paper proposes a novel hybrid deep learning framework for structured information extraction by integrating an Oriented Bounding Box (OBB) detection model with a transformer-based document parsing model (Donut). An in-house annotated dataset is used to train YOLOv11 for detecting nine key categories: Geometric Dimensioning and Tolerancing (GD&T), General Tolerances, Measures, Materials, Notes, Radii, Surface Roughness, Threads, and Title Blocks. Detected OBBs are cropped into images and labeled to fine-tune Donut for structured JSON output. Fine-tuning strategies include a single model trained across all categories and category-specific models. Results show that the single model consistently outperforms category-specific ones across all evaluation metrics, achieving higher precision (94.77% for GD&T), recall (100% for most categories), and F1 score (97.3%), while reducing hallucinations (5.23%). The proposed framework improves accuracy, reduces manual effort, and supports scalable deployment in precision-driven industries.

View on arXiv PDF

Similar