CVAIJan 27, 2022

Constrained Structure Learning for Scene Graph Generation

arXiv:2201.11697v112 citations
AI Analysis

This work addresses scene graph generation for computer vision applications, offering a novel inference approach that improves over existing methods.

The paper tackles the problem of scene graph generation by proposing a constrained structure learning method that uses entropic mirror descent for variational inference, achieving state-of-the-art performance on popular benchmarks.

As a structured prediction task, scene graph generation aims to build a visually-grounded scene graph to explicitly model objects and their relationships in an input image. Currently, the mean field variational Bayesian framework is the de facto methodology used by the existing methods, in which the unconstrained inference step is often implemented by a message passing neural network. However, such formulation fails to explore other inference strategies, and largely ignores the more general constrained optimization models. In this paper, we present a constrained structure learning method, for which an explicit constrained variational inference objective is proposed. Instead of applying the ubiquitous message-passing strategy, a generic constrained optimization method - entropic mirror descent - is utilized to solve the constrained variational inference step. We validate the proposed generic model on various popular scene graph generation benchmarks and show that it outperforms the state-of-the-art methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes