CVFeb 7, 2019

SiamVGG: Visual Tracking using Deeper Siamese Networks

arXiv:1902.02804v453 citationsHas Code
AI Analysis

This addresses the need for efficient and accurate visual tracking in applications like surveillance or robotics, though it is incremental as it builds on existing Siamese network and DCF methods.

The paper tackles the problem of achieving both high accuracy and real-time performance in visual tracking by proposing SiamVGG, which combines a CNN backbone with cross-correlation, resulting in state-of-the-art accuracy on multiple datasets and a real-time speed of 50 FPS.

Recently, we have seen a rapid development of Deep Neural Network (DNN) based visual tracking solutions. Some trackers combine the DNN-based solutions with Discriminative Correlation Filters (DCF) to extract semantic features and successfully deliver the state-of-the-art tracking accuracy. However, these solutions are highly compute-intensive, which require long processing time, resulting unsecured real-time performance. To deliver both high accuracy and reliable real-time performance, we propose a novel tracker called SiamVGG\footnote{https://github.com/leeyeehoo/SiamVGG}. It combines a Convolutional Neural Network (CNN) backbone and a cross-correlation operator, and takes advantage of the features from exemplary images for more accurate object tracking. The architecture of SiamVGG is customized from VGG-16 with the parameters shared by both exemplary images and desired input video frames. We demonstrate the proposed SiamVGG on OTB-2013/50/100 and VOT 2015/2016/2017 datasets with the state-of-the-art accuracy while maintaining a decent real-time performance of 50 FPS running on a GTX 1080Ti. Our design can achieve 2% higher Expected Average Overlap (EAO) compared to the ECO and C-COT in VOT2017 Challenge.

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes