CVMar 9, 2025

Online Dense Point Tracking with Streaming Memory

arXiv:2503.06471v22 citationsh-index: 7

Originality Incremental advance

AI Analysis

This work addresses the challenge of efficient and accurate long-range point tracking for video analysis, though it appears incremental as it builds on existing tracking methods with improvements in speed and memory usage.

The paper tackles the problem of dense point tracking in videos by proposing the SPOT framework, which achieves state-of-the-art accuracy on the CVO benchmark and operates at least 2× faster with 10× fewer parameters than previous models.

Dense point tracking is a challenging task requiring the continuous tracking of every point in the initial frame throughout a substantial portion of a video, even in the presence of occlusions. Traditional methods use optical flow models to directly estimate long-range motion, but they often suffer from appearance drifting without considering temporal consistency. Recent point tracking algorithms usually depend on sliding windows for indirect information propagation from the first frame to the current one, which is slow and less effective for long-range tracking. To account for temporal consistency and enable efficient information propagation, we present a lightweight and fast model with \textbf{S}treaming memory for dense \textbf{PO}int \textbf{T}racking and online video processing. The \textbf{SPOT} framework features three core components: a customized memory reading module for feature enhancement, a sensory memory for short-term motion dynamics modeling, and a visibility-guided splatting module for accurate information propagation. This combination enables SPOT to perform dense point tracking with state-of-the-art accuracy on the CVO benchmark, as well as comparable or superior performance to offline models on sparse tracking benchmarks such as TAP-Vid and RoboTAP. Notably, SPOT with 10$\times$ smaller parameter numbers operates at least 2$\times$ faster than previous state-of-the-art models while maintaining the best performance on CVO. We will release the models and codes at: https://dqiaole.github.io/SPOT/.

View on arXiv PDF

Similar