Siamese Keypoint Prediction Network for Visual Object Tracking
This work addresses the challenge of improving tracking accuracy and resistance to distractors for applications in video analysis, representing an incremental advancement over existing siamese trackers.
The paper tackles the problem of visual object tracking by proposing a Siamese keypoint prediction network (SiamKPN) that uses a cascade heatmap strategy for coarse-to-fine prediction, achieving state-of-the-art performance on benchmarks like OTB-100 and VOT2018 while running at real-time speed.
Visual object tracking aims to estimate the location of an arbitrary target in a video sequence given its initial bounding box. By utilizing offline feature learning, the siamese paradigm has recently been the leading framework for high performance tracking. However, current existing siamese trackers either heavily rely on complicated anchor-based detection networks or lack the ability to resist to distractors. In this paper, we propose the Siamese keypoint prediction network (SiamKPN) to address these challenges. Upon a Siamese backbone for feature embedding, SiamKPN benefits from a cascade heatmap strategy for coarse-to-fine prediction modeling. In particular, the strategy is implemented by sequentially shrinking the coverage of the label heatmap along the cascade to apply loose-to-strict intermediate supervisions. During inference, we find the predicted heatmaps of successive stages to be gradually concentrated to the target and reduced to the distractors. SiamKPN performs well against state-of-the-art trackers for visual object tracking on four benchmark datasets including OTB-100, VOT2018, LaSOT and GOT-10k, while running at real-time speed.