CV AIMay 12

Unlocking UML Class Diagram Understanding in Vision Language Models

arXiv:2605.1163411.1

Predicted impact top 60% in CV · last 90 daysOriginality Synthesis-oriented

AI Analysis

It addresses the lack of VLM understanding of computer science diagrams, providing a benchmark and training data for this domain.

The paper introduces a benchmark for visual question answering on UML class diagrams, showing that a LoRA-based fine-tune outperforms Qwen 3.5 27B on this task.

Although Vision Language Models (VLMs) have seen tremendous progress across all kinds of use cases, they still fall behind in answering questions regard-ing diagrams compared to photos. Although progress has been made in the area of bar charts, line charts and other diagrams like that there is still few research concerned with other types of diagrams, e.g. in the computer science domain. Our work presents a benchmark for visual question answering based on UML class diagrams which is both challenging and manageable. We further construct a large-scale training dataset with 16.000 image-question-answer triples and show that a LoRA-based finetune easily outperforms Qwen 3.5 27B, which is a recent and well-performing VLM in many other benchmarks.

View on arXiv PDF

Similar