CVApr 20, 2019

Cubic LSTMs for Video Prediction

arXiv:1904.09412v16.515 citationsh-index: 138

Originality Incremental advance

AI Analysis

This addresses video prediction for computer vision and robotics, but it is incremental as it builds on existing LSTM architectures.

The authors tackled video frame prediction by proposing a CubicLSTM unit with spatial, temporal, and output branches to capture moving objects and motions, and CubicRNN outperformed prior methods on synthetic and real-world datasets.

Predicting future frames in videos has become a promising direction of research for both computer vision and robot learning communities. The core of this problem involves moving object capture and future motion prediction. While object capture specifies which objects are moving in videos, motion prediction describes their future dynamics. Motivated by this analysis, we propose a Cubic Long Short-Term Memory (CubicLSTM) unit for video prediction. CubicLSTM consists of three branches, i.e., a spatial branch for capturing moving objects, a temporal branch for processing motions, and an output branch for combining the first two branches to generate predicted frames. Stacking multiple CubicLSTM units along the spatial branch and output branch, and then evolving along the temporal branch can form a cubic recurrent neural network (CubicRNN). Experiment shows that CubicRNN produces more accurate video predictions than prior methods on both synthetic and real-world datasets.

View on arXiv PDF

Similar