CVROMar 17, 2022

MSPred: Video Prediction at Multiple Spatio-Temporal Scales with Hierarchical Recurrent Networks

arXiv:2203.09303v413 citationsh-index: 57Has Code
Originality Incremental advance
AI Analysis

This work addresses the need for long-term action planning in autonomous systems by enabling multi-scale video prediction, though it appears incremental as it builds on existing hierarchical and recurrent approaches.

The paper tackles the problem of limited long-term video prediction for autonomous systems by proposing MSPred, a model that forecasts future outcomes at multiple spatio-temporal scales, achieving competitive performance in video frame prediction and accurately predicting high-level representations like keypoints on bin-picking and action recognition datasets.

Autonomous systems not only need to understand their current environment, but should also be able to predict future actions conditioned on past states, for instance based on captured camera frames. However, existing models mainly focus on forecasting future video frames for short time-horizons, hence being of limited use for long-term action planning. We propose Multi-Scale Hierarchical Prediction (MSPred), a novel video prediction model able to simultaneously forecast future possible outcomes of different levels of granularity at different spatio-temporal scales. By combining spatial and temporal downsampling, MSPred efficiently predicts abstract representations such as human poses or locations over long time horizons, while still maintaining a competitive performance for video frame prediction. In our experiments, we demonstrate that MSPred accurately predicts future video frames as well as high-level representations (e.g. keypoints or semantics) on bin-picking and action recognition datasets, while consistently outperforming popular approaches for future frame prediction. Furthermore, we ablate different modules and design choices in MSPred, experimentally validating that combining features of different spatial and temporal granularity leads to a superior performance. Code and models to reproduce our experiments can be found in https://github.com/AIS-Bonn/MSPred.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes