CVJul 14, 2023

CoTracker: It is Better to Track Together

arXiv:2307.07635v3565 citationsh-index: 105
Originality Highly original
AI Analysis

This addresses the challenge of robust point tracking in videos for computer vision applications, representing a novel method for a known bottleneck.

The paper tackles the problem of tracking many 2D points in long videos by introducing CoTracker, a transformer-based model that tracks points jointly rather than independently, improving accuracy and robustness, and it outperforms prior trackers on benchmarks.

We introduce CoTracker, a transformer-based model that tracks a large number of 2D points in long video sequences. Differently from most existing approaches that track points independently, CoTracker tracks them jointly, accounting for their dependencies. We show that joint tracking significantly improves tracking accuracy and robustness, and allows CoTracker to track occluded points and points outside of the camera view. We also introduce several innovations for this class of trackers, including using token proxies that significantly improve memory efficiency and allow CoTracker to track 70k points jointly and simultaneously at inference on a single GPU. CoTracker is an online algorithm that operates causally on short windows. However, it is trained utilizing unrolled windows as a recurrent network, maintaining tracks for long periods of time even when points are occluded or leave the field of view. Quantitatively, CoTracker substantially outperforms prior trackers on standard point-tracking benchmarks.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes