CVSep 16, 2024

Hydra-SGG: Hybrid Relation Assignment for One-stage Scene Graph Generation

arXiv:2409.10262v211 citationsh-index: 22
AI Analysis

This work addresses a specific bottleneck in scene graph generation for computer vision applications, offering incremental improvements over existing DETR-based methods.

The paper tackles the challenges of sparse supervision and false negative samples in one-stage scene graph generation by proposing Hydra-SGG, which uses a hybrid relation assignment approach to increase positive training samples and mitigate these issues, achieving state-of-the-art performance with metrics like 16.0 mR@50 on VG150 and 50.1 weighted score on Open Images V6.

DETR introduces a simplified one-stage framework for scene graph generation (SGG) but faces challenges of sparse supervision and false negative samples. The former occurs because each image typically contains fewer than 10 relation annotations, while DETR-based SGG models employ over 100 relation queries. Each ground truth relation is assigned to only one query during training. The latter arises when one ground truth relation may have multiple queries with similar matching scores, leading to suboptimally matched queries being treated as negative samples. To address these, we propose Hydra-SGG, a one-stage SGG method featuring a Hybrid Relation Assignment. This approach combines a One-to-One Relation Assignment with an IoU-based One-to-Many Relation Assignment, increasing positive training samples and mitigating sparse supervision. In addition, we empirically demonstrate that removing self-attention between relation queries leads to duplicate predictions, which actually benefits the proposed One-to-Many Relation Assignment. With this insight, we introduce Hydra Branch, an auxiliary decoder without self-attention layers, to further enhance One-to-Many Relation Assignment by promoting different queries to make the same relation prediction. Hydra-SGG achieves state-of-the-art performance on multiple datasets, including VG150 (16.0 mR@50), Open Images V6 (50.1 weighted score), and GQA (12.7 mR@50).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes