CVDec 31, 2018

SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks

arXiv:1812.11703v12175 citations
Originality Incremental advance
AI Analysis

This work improves visual tracking accuracy for applications like surveillance and robotics, though it is incremental by building on existing Siamese trackers.

The paper tackled the accuracy gap in Siamese network-based visual trackers by addressing the lack of strict translation invariance, enabling the use of deep networks like ResNet-50 and achieving state-of-the-art results on benchmarks such as OTB2015, VOT2018, UAV123, and LaSOT.

Siamese network based trackers formulate tracking as convolutional feature cross-correlation between target template and searching region. However, Siamese trackers still have accuracy gap compared with state-of-the-art algorithms and they cannot take advantage of feature from deep networks, such as ResNet-50 or deeper. In this work we prove the core reason comes from the lack of strict translation invariance. By comprehensive theoretical analysis and experimental validations, we break this restriction through a simple yet effective spatial aware sampling strategy and successfully train a ResNet-driven Siamese tracker with significant performance gain. Moreover, we propose a new model architecture to perform depth-wise and layer-wise aggregations, which not only further improves the accuracy but also reduces the model size. We conduct extensive ablation studies to demonstrate the effectiveness of the proposed tracker, which obtains currently the best results on four large tracking benchmarks, including OTB2015, VOT2018, UAV123, and LaSOT. Our model will be released to facilitate further studies based on this problem.

Code Implementations13 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes