CVSep 6, 2018

Unsupervised Learning of View-invariant Action Representations

arXiv:1809.01844v1110 citations
Originality Incremental advance
AI Analysis

This addresses the need for scalable action recognition without labeled data, though it is incremental as it builds on existing video representation learning.

The paper tackles the problem of expensive manual labeling for human action recognition by proposing an unsupervised framework that learns view-invariant action representations by predicting 3D motion across multiple views, demonstrating effectiveness on multiple datasets.

The recent success in human action recognition with deep learning methods mostly adopt the supervised learning paradigm, which requires significant amount of manually labeled data to achieve good performance. However, label collection is an expensive and time-consuming process. In this work, we propose an unsupervised learning framework, which exploits unlabeled data to learn video representations. Different from previous works in video representation learning, our unsupervised learning task is to predict 3D motion in multiple target views using video representation from a source view. By learning to extrapolate cross-view motions, the representation can capture view-invariant motion dynamics which is discriminative for the action. In addition, we propose a view-adversarial training method to enhance learning of view-invariant features. We demonstrate the effectiveness of the learned representations for action recognition on multiple datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes