CVLGOct 3, 2016

Video Pixel Networks

arXiv:1610.00527v1444 citations
Originality Incremental advance
AI Analysis

This work addresses video generation and prediction for applications like robotics and simulation, representing an incremental improvement over prior state-of-the-art methods.

The authors tackled video modeling by proposing the Video Pixel Network (VPN), a probabilistic model that estimates the joint distribution of raw pixel values, achieving near-best performance on the Moving MNIST benchmark with only minor deviations from ground truth.

We propose a probabilistic video model, the Video Pixel Network (VPN), that estimates the discrete joint distribution of the raw pixel values in a video. The model and the neural architecture reflect the time, space and color structure of video tensors and encode it as a four-dimensional dependency chain. The VPN approaches the best possible performance on the Moving MNIST benchmark, a leap over the previous state of the art, and the generated videos show only minor deviations from the ground truth. The VPN also produces detailed samples on the action-conditional Robotic Pushing benchmark and generalizes to the motion of novel objects.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes