CVAug 12, 2022

Motion Sensitive Contrastive Learning for Self-supervised Video Representation

arXiv:2208.06105v121 citationsh-index: 37
Originality Incremental advance
AI Analysis

This work addresses the need for better motion-aware features in video understanding tasks, offering incremental improvements over existing contrastive learning methods.

The paper tackles the problem of insufficient exploitation of short-term motion dynamics in self-supervised video representation learning by proposing Motion Sensitive Contrastive Learning (MSCL), which integrates motion information from optical flows into RGB frames, achieving top-1 accuracies of 91.5% on UCF101 and 50.3% on Something-Something v2 for video classification, and 65.6% Top-1 Recall on UCF101 for video retrieval.

Contrastive learning has shown great potential in video representation learning. However, existing approaches fail to sufficiently exploit short-term motion dynamics, which are crucial to various down-stream video understanding tasks. In this paper, we propose Motion Sensitive Contrastive Learning (MSCL) that injects the motion information captured by optical flows into RGB frames to strengthen feature learning. To achieve this, in addition to clip-level global contrastive learning, we develop Local Motion Contrastive Learning (LMCL) with frame-level contrastive objectives across the two modalities. Moreover, we introduce Flow Rotation Augmentation (FRA) to generate extra motion-shuffled negative samples and Motion Differential Sampling (MDS) to accurately screen training samples. Extensive experiments on standard benchmarks validate the effectiveness of the proposed method. With the commonly-used 3D ResNet-18 as the backbone, we achieve the top-1 accuracies of 91.5\% on UCF101 and 50.3\% on Something-Something v2 for video classification, and a 65.6\% Top-1 Recall on UCF101 for video retrieval, notably improving the state-of-the-art.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes