CVLGOct 28, 2020

Cycle-Contrast for Self-Supervised Video Representation Learning

arXiv:2010.14810v157 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of learning effective video representations without labeled data for video understanding tasks, offering a novel approach but with incremental improvements over existing methods.

The paper tackles the problem of self-supervised video representation learning by proposing Cycle-Contrastive Learning (CCL), which learns correspondences across frames and videos to improve transfer to downstream tasks, resulting in outperformance in nearest neighbour retrieval and action recognition on datasets like UCF101, HMDB51, and MMAct.

We present Cycle-Contrastive Learning (CCL), a novel self-supervised method for learning video representation. Following a nature that there is a belong and inclusion relation of video and its frames, CCL is designed to find correspondences across frames and videos considering the contrastive representation in their domains respectively. It is different from recent approaches that merely learn correspondences across frames or clips. In our method, the frame and video representations are learned from a single network based on an R3D architecture, with a shared non-linear transformation for embedding both frame and video features before the cycle-contrastive loss. We demonstrate that the video representation learned by CCL can be transferred well to downstream tasks of video understanding, outperforming previous methods in nearest neighbour retrieval and action recognition tasks on UCF101, HMDB51 and MMAct.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes