CV CL LGOct 10, 2019

Visual Natural Language Query Auto-Completion for Estimating Instance Probabilities

Samuel Sharpe, Jin Yan, Fan Wu, Iddo Drori

arXiv:1910.04887v10.9Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses a domain-specific problem in computer vision and natural language processing for applications like image retrieval or interactive systems, but it is incremental as it builds on existing methods like BERT.

The paper tackles the task of query auto-completion for estimating instance probabilities by completing user query prefixes conditioned on images and fine-tuning BERT embeddings to rank instances, showing that combining language and vision outperforms language-only approaches.

We present a new task of query auto-completion for estimating instance probabilities. We complete a user query prefix conditioned upon an image. Given the complete query, we fine tune a BERT embedding for estimating probabilities of a broad set of instances. The resulting instance probabilities are used for selection while being agnostic to the segmentation or attention mechanism. Our results demonstrate that auto-completion using both language and vision performs better than using only language, and that fine tuning a BERT embedding allows to efficiently rank instances in the image. In the spirit of reproducible research we make our data, models, and code available.

View on arXiv PDF Code

Similar