CVJun 10, 2025

MoSiC: Optimal-Transport Motion Trajectory for Dense Self-Supervised Learning

arXiv:2506.08694v22 citationsh-index: 67Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of inconsistent feature learning in dynamic scenes for video analysis, offering incremental improvements over existing methods.

The paper tackles the challenge of dense self-supervised learning in videos by proposing a motion-guided framework that clusters dense point tracks to learn spatiotemporally consistent representations, improving state-of-the-art by 1% to 6% on six image and video datasets across four benchmarks.

Dense self-supervised learning has shown great promise for learning pixel- and patch-level representations, but extending it to videos remains challenging due to the complexity of motion dynamics. Existing approaches struggle as they rely on static augmentations that fail under object deformations, occlusions, and camera movement, leading to inconsistent feature learning over time. We propose a motion-guided self-supervised learning framework that clusters dense point tracks to learn spatiotemporally consistent representations. By leveraging an off-the-shelf point tracker, we extract long-range motion trajectories and optimize feature clustering through a momentum-encoder-based optimal transport mechanism. To ensure temporal coherence, we propagate cluster assignments along tracked points, enforcing feature consistency across views despite viewpoint changes. Integrating motion as an implicit supervisory signal, our method learns representations that generalize across frames, improving robustness in dynamic scenes and challenging occlusion scenarios. By initializing from strong image-pretrained models and leveraging video data for training, we improve state-of-the-art by 1% to 6% on six image and video datasets and four evaluation benchmarks. The implementation is publicly available at our GitHub repository: https://github.com/SMSD75/MoSiC/tree/main

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes