CVAug 28, 2019

Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding

arXiv:1908.10568v193 citations
AI Analysis

This work addresses the problem of localizing objects in images based on linguistic queries without direct supervision, which is incremental but offers strong gains for computer vision applications.

The paper tackles weakly supervised referring expression grounding by proposing an adaptive reconstruction network (ARN) that uses adaptive grounding and collaborative reconstruction to match image regions with queries, achieving state-of-the-art performance on four large-scale datasets with significant improvements.

Weakly supervised referring expression grounding aims at localizing the referential object in an image according to the linguistic query, where the mapping between the referential object and query is unknown in the training stage. To address this problem, we propose a novel end-to-end adaptive reconstruction network (ARN). It builds the correspondence between image region proposal and query in an adaptive manner: adaptive grounding and collaborative reconstruction. Specifically, we first extract the subject, location and context features to represent the proposals and the query respectively. Then, we design the adaptive grounding module to compute the matching score between each proposal and query by a hierarchical attention model. Finally, based on attention score and proposal features, we reconstruct the input query with a collaborative loss of language reconstruction loss, adaptive reconstruction loss, and attribute classification loss. This adaptive mechanism helps our model to alleviate the variance of different referring expressions. Experiments on four large-scale datasets show ARN outperforms existing state-of-the-art methods by a large margin. Qualitative results demonstrate that the proposed ARN can better handle the situation where multiple objects of a particular category situated together.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes