CVSep 2, 2022

Temporal Contrastive Learning with Curriculum

arXiv:2209.00760v14 citationsh-index: 7
Originality Incremental advance
AI Analysis

This work addresses video understanding for action recognition and retrieval, but it is incremental as it builds on existing contrastive learning and curriculum techniques.

The authors tackled video representation learning by introducing ConCur, a contrastive method that uses curriculum learning to dynamically sample clips from easy to hard positives, achieving state-of-the-art performance on UCF101 and HMDB51 datasets for action recognition and retrieval tasks.

We present ConCur, a contrastive video representation learning method that uses curriculum learning to impose a dynamic sampling strategy in contrastive training. More specifically, ConCur starts the contrastive training with easy positive samples (temporally close and semantically similar clips), and as the training progresses, it increases the temporal span effectively sampling hard positives (temporally away and semantically dissimilar). To learn better context-aware representations, we also propose an auxiliary task of predicting the temporal distance between a positive pair of clips. We conduct extensive experiments on two popular action recognition datasets, UCF101 and HMDB51, on which our proposed method achieves state-of-the-art performance on two benchmark tasks of video action recognition and video retrieval. We explore the impact of encoder backbones and pre-training strategies by using R(2+1)D and C3D encoders and pre-training on Kinetics-400 and Kinetics-200 datasets. Moreover, a detailed ablation study shows the effectiveness of each of the components of our proposed method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes