CVApr 15, 2024

Learning Tracking Representations from Single Point Annotations

arXiv:2404.09504v11 citationsh-index: 52024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
Originality Incremental advance
AI Analysis

This addresses the annotation bottleneck in visual tracking, making it more efficient and cost-effective for researchers and practitioners, though it is incremental as it builds on existing contrastive learning methods.

The paper tackles the problem of expensive bounding box annotations for training deep trackers by proposing a method to learn tracking representations from single point annotations, which are 4.5x faster to annotate, achieving comparable performance to fully supervised baselines while reducing annotation time by 78% and costs by 85%.

Existing deep trackers are typically trained with largescale video frames with annotated bounding boxes. However, these bounding boxes are expensive and time-consuming to annotate, in particular for large scale datasets. In this paper, we propose to learn tracking representations from single point annotations (i.e., 4.5x faster to annotate than the traditional bounding box) in a weakly supervised manner. Specifically, we propose a soft contrastive learning (SoCL) framework that incorporates target objectness prior into end-to-end contrastive learning. Our SoCL consists of adaptive positive and negative sample generation, which is memory-efficient and effective for learning tracking representations. We apply the learned representation of SoCL to visual tracking and show that our method can 1) achieve better performance than the fully supervised baseline trained with box annotations under the same annotation time cost; 2) achieve comparable performance of the fully supervised baseline by using the same number of training frames and meanwhile reducing annotation time cost by 78% and total fees by 85%; 3) be robust to annotation noise.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes