CVDec 7, 2019

A Real-time Global Inference Network for One-stage Referring Expression Comprehension

arXiv:1912.03478v181 citations
Originality Incremental advance
AI Analysis

This work addresses the computational inefficiency in REC, enabling real-time applications, but it is incremental as it builds on prior methods like MAttNet.

The paper tackles the problem of slow multi-stage pipelines in Referring Expression Comprehension (REC) by proposing a one-stage model called RealGIN, which achieves competitive performance on five benchmark datasets and boosts processing speed by about 10 times compared to existing methods.

Referring Expression Comprehension (REC) is an emerging research spot in computer vision, which refers to detecting the target region in an image given an text description. Most existing REC methods follow a multi-stage pipeline, which are computationally expensive and greatly limit the application of REC. In this paper, we propose a one-stage model towards real-time REC, termed Real-time Global Inference Network (RealGIN). RealGIN addresses the diversity and complexity issues in REC with two innovative designs: the Adaptive Feature Selection (AFS) and the Global Attentive ReAsoNing unit (GARAN). AFS adaptively fuses features at different semantic levels to handle the varying content of expressions. GARAN uses the textual feature as a pivot to collect expression-related visual information from all regions, and thenselectively diffuse such information back to all regions, which provides sufficient context for modeling the complex linguistic conditions in expressions. On five benchmark datasets, i.e., RefCOCO, RefCOCO+, RefCOCOg, ReferIt and Flickr30k, the proposed RealGIN outperforms most prior works and achieves very competitive performances against the most advanced method, i.e., MAttNet. Most importantly, under the same hardware, RealGIN can boost the processing speed by about 10 times over the existing methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes