CVJul 31, 2016

Visual Relationship Detection with Language Priors

arXiv:1608.00187v11231 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of scaling visual relationship detection for computer vision applications, though it is incremental by building on prior work with language priors.

The paper tackles the problem of detecting visual relationships in images, such as 'man riding bicycle', by proposing a model that trains visual models for objects and predicates individually and combines them with language priors from semantic word embeddings, enabling prediction of thousands of relationship types from few examples and improving image retrieval.

Visual relationships capture a wide variety of interactions between pairs of objects in images (e.g. "man riding bicycle" and "man pushing bicycle"). Consequently, the set of possible relationships is extremely large and it is difficult to obtain sufficient training examples for all possible relationships. Because of this limitation, previous work on visual relationship detection has concentrated on predicting only a handful of relationships. Though most relationships are infrequent, their objects (e.g. "man" and "bicycle") and predicates (e.g. "riding" and "pushing") independently occur more frequently. We propose a model that uses this insight to train visual models for objects and predicates individually and later combines them together to predict multiple relationships per image. We improve on prior work by leveraging language priors from semantic word embeddings to finetune the likelihood of a predicted relationship. Our model can scale to predict thousands of types of relationships from a few examples. Additionally, we localize the objects in the predicted relationships as bounding boxes in the image. We further demonstrate that understanding relationships can improve content based image retrieval.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes