CVFeb 1, 2019

VrR-VG: Refocusing Visually-Relevant Relationships

arXiv:1902.00313v236 citations
AI Analysis

This addresses the issue of biased visual relationship learning for scene understanding tasks, though it is incremental as it builds on existing datasets like Visual Genome.

The authors tackled the problem of visual relationship learning being biased by non-visual statistical patterns by proposing a method to prune visually-irrelevant relationships, resulting in a new dataset (VrR-VG) that improves image captioning and visual question answering performance with a large margin.

Relationships encode the interactions among individual instances, and play a critical role in deep visual scene understanding. Suffering from the high predictability with non-visual information, existing methods tend to fit the statistical bias rather than ``learning'' to ``infer'' the relationships from images. To encourage further development in visual relationships, we propose a novel method to automatically mine more valuable relationships by pruning visually-irrelevant ones. We construct a new scene-graph dataset named Visually-Relevant Relationships Dataset (VrR-VG) based on Visual Genome. Compared with existing datasets, the performance gap between learnable and statistical method is more significant in VrR-VG, and frequency-based analysis does not work anymore. Moreover, we propose to learn a relationship-aware representation by jointly considering instances, attributes and relationships. By applying the representation-aware feature learned on VrR-VG, the performances of image captioning and visual question answering are systematically improved with a large margin, which demonstrates the gain of our dataset and the features embedding schema. VrR-VG is available via http://vrr-vg.com/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes