CVAug 8, 2020

RPT: Learning Point Set Representation for Siamese Visual Tracking

arXiv:2008.03467v274 citations
AI Analysis

This addresses the challenge of coarse bounding box representation in visual tracking for applications like surveillance and robotics, offering a more accurate method.

The paper tackles the problem of inaccurate target state estimation in visual tracking by proposing a framework that uses a set of representative points for finer representation, achieving new state-of-the-art performance on benchmarks like OTB2015, VOT2018, VOT2019, and GOT-10k while running at over 20 FPS.

While remarkable progress has been made in robust visual tracking, accurate target state estimation still remains a highly challenging problem. In this paper, we argue that this issue is closely related to the prevalent bounding box representation, which provides only a coarse spatial extent of object. Thus an effcient visual tracking framework is proposed to accurately estimate the target state with a finer representation as a set of representative points. The point set is trained to indicate the semantically and geometrically significant positions of target region, enabling more fine-grained localization and modeling of object appearance. We further propose a multi-level aggregation strategy to obtain detailed structure information by fusing hierarchical convolution layers. Extensive experiments on several challenging benchmarks including OTB2015, VOT2018, VOT2019 and GOT-10k demonstrate that our method achieves new state-of-the-art performance while running at over 20 FPS.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes