CVNov 5, 2019

High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks

arXiv:1911.01655v1150 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of video prediction for applications like robotics and autonomous systems, but it is incremental as it builds on existing methods by scaling up models.

The authors tackled the problem of predicting future video frames by questioning the need for complex handcrafted architectures and instead proposed minimizing inductive bias while maximizing network capacity, achieving state-of-the-art performance on three datasets for modeling object interactions, human motion, and car driving.

Predicting future video frames is extremely challenging, as there are many factors of variation that make up the dynamics of how frames change through time. Previously proposed solutions require complex inductive biases inside network architectures with highly specialized computation, including segmentation masks, optical flow, and foreground and background separation. In this work, we question if such handcrafted architectures are necessary and instead propose a different approach: finding minimal inductive bias for video prediction while maximizing network capacity. We investigate this question by performing the first large-scale empirical study and demonstrate state-of-the-art performance by learning large models on three different datasets: one for modeling object interactions, one for modeling human motion, and one for modeling car driving.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes