CVAug 3, 2020

Memory-augmented Dense Predictive Coding for Video Representation Learning

arXiv:2008.01065v132.7262 citationsHas Code

Originality Incremental advance

AI Analysis

It addresses video representation learning for action recognition, offering an efficient method with broad applicability across tasks like retrieval and classification, though it appears incremental as it builds on existing predictive coding approaches.

The paper tackles self-supervised learning from video for action recognition by proposing Memory-augmented Dense Predictive Coding (MemDPC), which achieves state-of-the-art or comparable performance on four downstream tasks with significantly less training data.

The objective of this paper is self-supervised learning from video, in particular for representations for action recognition. We make the following contributions: (i) We propose a new architecture and learning framework Memory-augmented Dense Predictive Coding (MemDPC) for the task. It is trained with a predictive attention mechanism over the set of compressed memories, such that any future states can always be constructed by a convex combination of the condense representations, allowing to make multiple hypotheses efficiently. (ii) We investigate visual-only self-supervised video representation learning from RGB frames, or from unsupervised optical flow, or both. (iii) We thoroughly evaluate the quality of learnt representation on four different downstream tasks: action recognition, video retrieval, learning with scarce annotations, and unintentional action classification. In all cases, we demonstrate state-of-the-art or comparable performance over other approaches with orders of magnitude fewer training data.

View on arXiv PDF Code

Similar