CVMar 20, 2023

Location-Free Scene Graph Generation

Ege Özsoy, Felix Holm, Mahdi Saleh, Tobias Czempiel, Chantal Pellegrini, Nassir Navab, Benjamin Busam

arXiv:2303.10944v33.95 citationsh-index: 58Has Code

Originality Incremental advance

AI Analysis

This addresses the high annotation costs and dataset limitations in scene graph generation for computer vision applications, though it is incremental as it builds on existing SGG tasks.

The paper tackles the problem of scene graph generation by eliminating the need for location labels like bounding boxes, introducing location-free scene graph generation (LF-SGG) and proposing Pix2SG, an autoregressive method that achieves competitive performance on three datasets and downstream tasks such as image retrieval and visual question answering.

Scene Graph Generation (SGG) is a visual understanding task, aiming to describe a scene as a graph of entities and their relationships with each other. Existing works rely on location labels in form of bounding boxes or segmentation masks, increasing annotation costs and limiting dataset expansion. Recognizing that many applications do not require location data, we break this dependency and introduce location-free scene graph generation (LF-SGG). This new task aims at predicting instances of entities, as well as their relationships, without the explicit calculation of their spatial localization. To objectively evaluate the task, the predicted and ground truth scene graphs need to be compared. We solve this NP-hard problem through an efficient branching algorithm. Additionally, we design the first LF-SGG method, Pix2SG, using autoregressive sequence modeling. We demonstrate the effectiveness of our method on three scene graph generation datasets as well as two downstream tasks, image retrieval and visual question answering, and show that our approach is competitive to existing methods while not relying on location cues.

View on arXiv PDF Code

Similar