CVAILGOct 6, 2021

A Hierarchical Variational Neural Uncertainty Model for Stochastic Video Prediction

arXiv:2110.03446v118 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of handling predictive uncertainty in video prediction models, which is important for applications like autonomous systems, but it is incremental as it builds on prior latent prior estimation methods.

The paper tackles the problem of stochastic video prediction by introducing a hierarchical variational framework to quantify model predictive uncertainty, which is used to weight the MSE loss, leading to more effective training and improved video generation quality and diversity on benchmark datasets.

Predicting the future frames of a video is a challenging task, in part due to the underlying stochastic real-world phenomena. Prior approaches to solve this task typically estimate a latent prior characterizing this stochasticity, however do not account for the predictive uncertainty of the (deep learning) model. Such approaches often derive the training signal from the mean-squared error (MSE) between the generated frame and the ground truth, which can lead to sub-optimal training, especially when the predictive uncertainty is high. Towards this end, we introduce Neural Uncertainty Quantifier (NUQ) - a stochastic quantification of the model's predictive uncertainty, and use it to weigh the MSE loss. We propose a hierarchical, variational framework to derive NUQ in a principled manner using a deep, Bayesian graphical model. Our experiments on four benchmark stochastic video prediction datasets show that our proposed framework trains more effectively compared to the state-of-the-art models (especially when the training sets are small), while demonstrating better video generation quality and diversity against several evaluation metrics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes