CVJul 12, 2021

Scenes and Surroundings: Scene Graph Generation using Relation Transformer

arXiv:2107.05448v18 citations
Originality Highly original
AI Analysis

This work addresses the problem of understanding complex visual relationships in images for applications in computer vision, representing an incremental advancement with specific performance gains.

The paper tackles the challenging task of scene graph generation by proposing a novel relation transformer architecture that captures contextual dependencies between objects and their relationships, achieving a 4.85% overall improvement and setting a new benchmark on the Visual Genome dataset.

Identifying objects in an image and their mutual relationships as a scene graph leads to a deep understanding of image content. Despite the recent advancement in deep learning, the detection and labeling of visual object relationships remain a challenging task. This work proposes a novel local-context aware architecture named relation transformer, which exploits complex global objects to object and object to edge (relation) interactions. Our hierarchical multi-head attention-based approach efficiently captures contextual dependencies between objects and predicts their relationships. In comparison to state-of-the-art approaches, we have achieved an overall mean \textbf{4.85\%} improvement and a new benchmark across all the scene graph generation tasks on the Visual Genome dataset.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes