CVAug 3, 2020

Memory-augmented Dense Predictive Coding for Video Representation Learning

arXiv:2008.01065v1262 citations
Originality Incremental advance
AI Analysis

It addresses video representation learning for action recognition, offering an efficient method with broad applicability across tasks like retrieval and classification, though it appears incremental as it builds on existing predictive coding approaches.

The paper tackles self-supervised learning from video for action recognition by proposing Memory-augmented Dense Predictive Coding (MemDPC), which achieves state-of-the-art or comparable performance on four downstream tasks with significantly less training data.

The objective of this paper is self-supervised learning from video, in particular for representations for action recognition. We make the following contributions: (i) We propose a new architecture and learning framework Memory-augmented Dense Predictive Coding (MemDPC) for the task. It is trained with a predictive attention mechanism over the set of compressed memories, such that any future states can always be constructed by a convex combination of the condense representations, allowing to make multiple hypotheses efficiently. (ii) We investigate visual-only self-supervised video representation learning from RGB frames, or from unsupervised optical flow, or both. (iii) We thoroughly evaluate the quality of learnt representation on four different downstream tasks: action recognition, video retrieval, learning with scarce annotations, and unintentional action classification. In all cases, we demonstrate state-of-the-art or comparable performance over other approaches with orders of magnitude fewer training data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes