CVJan 7, 2017

Unsupervised Learning of Long-Term Motion Dynamics for Videos

arXiv:1701.01821v3196 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of learning robust video representations for activity recognition in computer vision, but it is incremental as it builds on existing encoder-decoder and flow-based methods.

The paper tackles unsupervised learning of long-term motion dynamics from videos by predicting sequences of 3D flows using an encoder-decoder framework, achieving effectiveness in activity classification on datasets like NTU RGB+D and MSR Daily Activity 3D.

We present an unsupervised representation learning approach that compactly encodes the motion dependencies in videos. Given a pair of images from a video clip, our framework learns to predict the long-term 3D motions. To reduce the complexity of the learning framework, we propose to describe the motion as a sequence of atomic 3D flows computed with RGB-D modality. We use a Recurrent Neural Network based Encoder-Decoder framework to predict these sequences of flows. We argue that in order for the decoder to reconstruct these sequences, the encoder must learn a robust video representation that captures long-term motion dependencies and spatial-temporal relations. We demonstrate the effectiveness of our learned temporal representations on activity classification across multiple modalities and datasets such as NTU RGB+D and MSR Daily Activity 3D. Our framework is generic to any input modality, i.e., RGB, Depth, and RGB-D videos.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes