CVCLLGOct 10, 2019

Visual Natural Language Query Auto-Completion for Estimating Instance Probabilities

arXiv:1910.04887v1Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses a domain-specific problem in computer vision and natural language processing for applications like image retrieval or interactive systems, but it is incremental as it builds on existing methods like BERT.

The paper tackles the task of query auto-completion for estimating instance probabilities by completing user query prefixes conditioned on images and fine-tuning BERT embeddings to rank instances, showing that combining language and vision outperforms language-only approaches.

We present a new task of query auto-completion for estimating instance probabilities. We complete a user query prefix conditioned upon an image. Given the complete query, we fine tune a BERT embedding for estimating probabilities of a broad set of instances. The resulting instance probabilities are used for selection while being agnostic to the segmentation or attention mechanism. Our results demonstrate that auto-completion using both language and vision performs better than using only language, and that fine tuning a BERT embedding allows to efficiently rank instances in the image. In the spirit of reproducible research we make our data, models, and code available.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes