CVHCNov 30, 2022

SGDraw: Scene Graph Drawing Interface Using Object-Oriented Representation

arXiv:2211.16697v2h-index: 14
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of inefficient scene graph annotation for researchers and practitioners in computer vision, offering an incremental improvement over existing interfaces.

The authors tackled the difficulty of drawing proper scene graphs for computer vision tasks by proposing SGDraw, an interactive web-based interface using object-oriented representation, which in user studies generated scene graphs with richer details and more accurate image descriptions compared to traditional tools.

Scene understanding is an essential and challenging task in computer vision. To provide the visually fundamental graphical structure of an image, the scene graph has received increased attention due to its powerful semantic representation. However, it is difficult to draw a proper scene graph for image retrieval, image generation, and multi-modal applications. The conventional scene graph annotation interface is not easy to use in image annotations, and the automatic scene graph generation approaches using deep neural networks are prone to generate redundant content while disregarding details. In this work, we propose SGDraw, a scene graph drawing interface using object-oriented scene graph representation to help users draw and edit scene graphs interactively. For the proposed object-oriented representation, we consider the objects, attributes, and relationships of objects as a structural unit. SGDraw provides a web-based scene graph annotation and generation tool for scene understanding applications. To verify the effectiveness of the proposed interface, we conducted a comparison study with the conventional tool and the user experience study. The results show that SGDraw can help generate scene graphs with richer details and describe the images more accurately than traditional bounding box annotations. We believe the proposed SGDraw can be useful in various vision tasks, such as image retrieval and generation.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes