CVNov 23, 2021

Learning Dynamic Compact Memory Embedding for Deformable Visual Object Tracking

arXiv:2111.11625v18 citations
Originality Incremental advance
AI Analysis

This work improves deformable object tracking for computer vision applications, offering incremental advancements over existing segmentation-based methods.

The paper tackles the problem of tracking deformable visual objects by addressing limitations in existing template-based and segmentation-based trackers, which often lack discriminative capacity for challenges like distractors and appearance changes, and proposes a dynamic compact memory embedding method that updates target features online and uses point-to-global matching, achieving superior performance on benchmarks such as VOT2016-2019, GOT-10K, TrackingNet, LaSOT, and outperforming D3S and SiamMask on DAVIS2017.

Recently, template-based trackers have become the leading tracking algorithms with promising performance in terms of efficiency and accuracy. However, the correlation operation between query feature and the given template only exploits accurate target localization, leading to state estimation error especially when the target suffers from severe deformable variations. To address this issue, segmentation-based trackers have been proposed that employ per-pixel matching to improve the tracking performance of deformable objects effectively. However, most of existing trackers only refer to the target features in the initial frame, thereby lacking the discriminative capacity to handle challenging factors, e.g., similar distractors, background clutter, appearance change, etc. To this end, we propose a dynamic compact memory embedding to enhance the discrimination of the segmentation-based deformable visual tracking method. Specifically, we initialize a memory embedding with the target features in the first frame. During the tracking process, the current target features that have high correlation with existing memory are updated to the memory embedding online. To further improve the segmentation accuracy for deformable objects, we employ a point-to-global matching strategy to measure the correlation between the pixel-wise query features and the whole template, so as to capture more detailed deformation information. Extensive evaluations on six challenging tracking benchmarks including VOT2016, VOT2018, VOT2019, GOT-10K, TrackingNet, and LaSOT demonstrate the superiority of our method over recent remarkable trackers. Besides, our method outperforms the excellent segmentation-based trackers, i.e., D3S and SiamMask on DAVIS2017 benchmark.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes