DMV: Visual Object Tracking via Part-level Dense Memory and Voting-based Retrieval
This work addresses the problem of handling appearance changes and occlusions in visual object tracking for researchers and practitioners, representing an incremental improvement over existing Siamese trackers.
The authors tackled the limitation of single-template matching in visual object tracking by introducing a memory-based tracker that uses part-level dense memory and voting-based retrieval, achieving comparable results to state-of-the-art methods on multiple benchmarks.
We propose a novel memory-based tracker via part-level dense memory and voting-based retrieval, called DMV. Since deep learning techniques have been introduced to the tracking field, Siamese trackers have attracted many researchers due to the balance between speed and accuracy. However, most of them are based on a single template matching, which limits the performance as it restricts the accessible in-formation to the initial target features. In this paper, we relieve this limitation by maintaining an external memory that saves the tracking record. Part-level retrieval from the memory also liberates the information from the template and allows our tracker to better handle the challenges such as appearance changes and occlusions. By updating the memory during tracking, the representative power for the target object can be enhanced without online learning. We also propose a novel voting mechanism for the memory reading to filter out unreliable information in the memory. We comprehensively evaluate our tracker on OTB-100,TrackingNet, GOT-10k, LaSOT, and UAV123, which show that our method yields comparable results to the state-of-the-art methods.