CVJan 7, 2017

Unsupervised Learning of Long-Term Motion Dynamics for Videos

Zelun Luo, Boya Peng, De-An Huang, Alexandre Alahi, Li Fei-Fei

arXiv:1701.01821v322.1196 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of learning robust video representations for activity recognition in computer vision, but it is incremental as it builds on existing encoder-decoder and flow-based methods.

The paper tackles unsupervised learning of long-term motion dynamics from videos by predicting sequences of 3D flows using an encoder-decoder framework, achieving effectiveness in activity classification on datasets like NTU RGB+D and MSR Daily Activity 3D.

We present an unsupervised representation learning approach that compactly encodes the motion dependencies in videos. Given a pair of images from a video clip, our framework learns to predict the long-term 3D motions. To reduce the complexity of the learning framework, we propose to describe the motion as a sequence of atomic 3D flows computed with RGB-D modality. We use a Recurrent Neural Network based Encoder-Decoder framework to predict these sequences of flows. We argue that in order for the decoder to reconstruct these sequences, the encoder must learn a robust video representation that captures long-term motion dependencies and spatial-temporal relations. We demonstrate the effectiveness of our learned temporal representations on activity classification across multiple modalities and datasets such as NTU RGB+D and MSR Daily Activity 3D. Our framework is generic to any input modality, i.e., RGB, Depth, and RGB-D videos.

View on arXiv PDF

Similar