CVMar 29, 2018

Unsupervised Textual Grounding: Linking Words to Image Concepts

Raymond A. Yeh, Minh N. Do, Alexander G. Schwing

arXiv:1803.11185v115.444 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of reducing annotation costs for robotics and human-computer interaction applications, though it is incremental as it builds on existing task formulations.

The paper tackled the problem of textual grounding, which links words to objects in images, by developing an unsupervised method to avoid the need for expensive labeled datasets, achieving performance improvements of 7.98% on the ReferIt Game dataset and 6.96% on Flickr30k data compared to baselines.

Textual grounding, i.e., linking words to objects in images, is a challenging but important task for robotics and human-computer interaction. Existing techniques benefit from recent progress in deep learning and generally formulate the task as a supervised learning problem, selecting a bounding box from a set of possible options. To train these deep net based approaches, access to a large-scale datasets is required, however, constructing such a dataset is time-consuming and expensive. Therefore, we develop a completely unsupervised mechanism for textual grounding using hypothesis testing as a mechanism to link words to detected image concepts. We demonstrate our approach on the ReferIt Game dataset and the Flickr30k data, outperforming baselines by 7.98% and 6.96% respectively.

View on arXiv PDF

Similar