CVMar 21, 2024

DSGG: Dense Relation Transformer for an End-to-end Scene Graph Generation

arXiv:2403.14886v120 citationsh-index: 13Has CodeCVPR
Originality Incremental advance
AI Analysis

This addresses the challenge of generating detailed spatial and semantic relationships in images for computer vision applications, representing an incremental advance over existing Transformer-based methods.

The paper tackles the problem of scene graph generation by proposing DSGG, a Transformer-based method that views it as a direct graph prediction problem using graph-aware queries, achieving state-of-the-art results with improvements of 3.5% and 6.7% in mR@50 and mR@100 on VG and 8.5% and 10.3% on PSG datasets.

Scene graph generation aims to capture detailed spatial and semantic relationships between objects in an image, which is challenging due to incomplete labelling, long-tailed relationship categories, and relational semantic overlap. Existing Transformer-based methods either employ distinct queries for objects and predicates or utilize holistic queries for relation triplets and hence often suffer from limited capacity in learning low-frequency relationships. In this paper, we present a new Transformer-based method, called DSGG, that views scene graph detection as a direct graph prediction problem based on a unique set of graph-aware queries. In particular, each graph-aware query encodes a compact representation of both the node and all of its relations in the graph, acquired through the utilization of a relaxed sub-graph matching during the training process. Moreover, to address the problem of relational semantic overlap, we utilize a strategy for relation distillation, aiming to efficiently learn multiple instances of semantic relationships. Extensive experiments on the VG and the PSG datasets show that our model achieves state-of-the-art results, showing a significant improvement of 3.5\% and 6.7\% in mR@50 and mR@100 for the scene-graph generation task and achieves an even more substantial improvement of 8.5\% and 10.3\% in mR@50 and mR@100 for the panoptic scene graph generation task. Code is available at \url{https://github.com/zeeshanhayder/DSGG}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes