Particle Video Revisited: Tracking Through Occlusions Using Point Trajectories
This work addresses the problem of tracking pixels through occlusions in videos for applications like motion analysis, offering an incremental improvement by updating a classic method with modern techniques.
The paper tackles long-range pixel tracking in videos by revisiting the particle video approach, using modern components like dense cost maps and learned appearance updates, and achieves favorable results against state-of-the-art methods in trajectory estimation and keypoint propagation benchmarks.
Tracking pixels in videos is typically studied as an optical flow estimation problem, where every pixel is described with a displacement vector that locates it in the next frame. Even though wider temporal context is freely available, prior efforts to take this into account have yielded only small gains over 2-frame methods. In this paper, we revisit Sand and Teller's "particle video" approach, and study pixel tracking as a long-range motion estimation problem, where every pixel is described with a trajectory that locates it in multiple future frames. We re-build this classic approach using components that drive the current state-of-the-art in flow and object tracking, such as dense cost maps, iterative optimization, and learned appearance updates. We train our models using long-range amodal point trajectories mined from existing optical flow data that we synthetically augment with multi-frame occlusions. We test our approach in trajectory estimation benchmarks and in keypoint label propagation tasks, and compare favorably against state-of-the-art optical flow and feature tracking methods.