CVLGJul 11, 2021

Zero-Shot Scene Graph Relation Prediction through Commonsense Knowledge Integration

arXiv:2107.05080v152 citations
Originality Highly original
AI Analysis

This addresses the limitation of existing scene graph generation frameworks in handling unseen relations, which is crucial for advancing visual understanding and reasoning tasks.

The paper tackles the problem of zero-shot relation prediction in scene graph generation by integrating commonsense knowledge, achieving improved performance on unseen triplets as demonstrated on Visual Genome datasets.

Relation prediction among entities in images is an important step in scene graph generation (SGG), which further impacts various visual understanding and reasoning tasks. Existing SGG frameworks, however, require heavy training yet are incapable of modeling unseen (i.e.,zero-shot) triplets. In this work, we stress that such incapability is due to the lack of commonsense reasoning,i.e., the ability to associate similar entities and infer similar relations based on general understanding of the world. To fill this gap, we propose CommOnsense-integrAted sCenegrapHrElation pRediction (COACHER), a framework to integrate commonsense knowledge for SGG, especially for zero-shot relation prediction. Specifically, we develop novel graph mining pipelines to model the neighborhoods and paths around entities in an external commonsense knowledge graph, and integrate them on top of state-of-the-art SGG frameworks. Extensive quantitative evaluations and qualitative case studies on both original and manipulated datasets from Visual Genome demonstrate the effectiveness of our proposed approach.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes