CVApr 4, 2019

VQD: Visual Query Detection in Natural Scenes

arXiv:1904.02794v21097 citations
Originality Synthesis-oriented
AI Analysis

This addresses the problem of multi-object visual grounding for computer vision researchers, but it is incremental as it builds on existing referring expression recognition tasks.

The authors introduced Visual Query Detection (VQD), a new visual grounding task for localizing multiple objects in images using natural language, and created the first dataset and baseline algorithms to show its increased difficulty compared to single-object localization.

We propose Visual Query Detection (VQD), a new visual grounding task. In VQD, a system is guided by natural language to localize a variable number of objects in an image. VQD is related to visual referring expression recognition, where the task is to localize only one object. We describe the first dataset for VQD and we propose baseline algorithms that demonstrate the difficulty of the task compared to referring expression recognition.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes