CVDec 10, 2021

LCTR: On Awakening the Local Continuity of Transformer for Weakly Supervised Object Localization

arXiv:2112.05291v162 citations
Originality Incremental advance
AI Analysis

This work addresses the issue of incomplete object localization in WSOL for computer vision applications, representing an incremental improvement by combining local and global features in transformers.

The paper tackles the problem of weakly supervised object localization (WSOL) by addressing transformers' lack of locality, which can ignore object extent, and proposes LCTR to enhance local perception in global features, achieving improved performance on CUB-200-2011 and ILSVRC datasets.

Weakly supervised object localization (WSOL) aims to learn object localizer solely by using image-level labels. The convolution neural network (CNN) based techniques often result in highlighting the most discriminative part of objects while ignoring the entire object extent. Recently, the transformer architecture has been deployed to WSOL to capture the long-range feature dependencies with self-attention mechanism and multilayer perceptron structure. Nevertheless, transformers lack the locality inductive bias inherent to CNNs and therefore may deteriorate local feature details in WSOL. In this paper, we propose a novel framework built upon the transformer, termed LCTR (Local Continuity TRansformer), which targets at enhancing the local perception capability of global features among long-range feature dependencies. To this end, we propose a relational patch-attention module (RPAM), which considers cross-patch information on a global basis. We further design a cue digging module (CDM), which utilizes local features to guide the learning trend of the model for highlighting the weak local responses. Finally, comprehensive experiments are carried out on two widely used datasets, ie, CUB-200-2011 and ILSVRC, to verify the effectiveness of our method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes