ViRED: Prediction of Visual Relations in Engineering Drawings
This addresses the challenge of interpreting documents with substantial image information, such as electrical engineering drawings, for professionals in that domain, but it is incremental as it adapts existing visual relation detection methods to a specific application.
The paper tackled the problem of understanding engineering drawings by predicting visual relations between tables and circuits, achieving 96% accuracy in relation prediction, which is a substantial improvement over existing methods.
To accurately understand engineering drawings, it is essential to establish the correspondence between images and their description tables within the drawings. Existing document understanding methods predominantly focus on text as the main modality, which is not suitable for documents containing substantial image information. In the field of visual relation detection, the structure of the task inherently limits its capacity to assess relationships among all entity pairs in the drawings. To address this issue, we propose a vision-based relation detection model, named ViRED, to identify the associations between tables and circuits in electrical engineering drawings. Our model mainly consists of three parts: a vision encoder, an object encoder, and a relation decoder. We implement ViRED using PyTorch to evaluate its performance. To validate the efficacy of ViRED, we conduct a series of experiments. The experimental results indicate that, within the engineering drawing dataset, our approach attained an accuracy of 96\% in the task of relation prediction, marking a substantial improvement over existing methodologies. The results also show that ViRED can inference at a fast speed even when there are numerous objects in a single engineering drawing.