LGCVMLOct 6, 2019

Structured Object-Aware Physics Prediction for Video Modeling and Planning

arXiv:1910.02425v277 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of enabling computers to understand and simulate physical interactions from video data, which is incremental as it builds on prior work by reusing dynamics models for improved training.

The paper tackles the problem of learning physical dynamics from videos in an unsupervised manner by introducing STOVE, a state-space model that reasons about objects and their interactions, which predicts videos with convincing behavior over hundreds of timesteps and outperforms previous unsupervised models, approaching supervised baseline performance.

When humans observe a physical system, they can easily locate objects, understand their interactions, and anticipate future behavior, even in settings with complicated and previously unseen interactions. For computers, however, learning such models from videos in an unsupervised fashion is an unsolved research problem. In this paper, we present STOVE, a novel state-space model for videos, which explicitly reasons about objects and their positions, velocities, and interactions. It is constructed by combining an image model and a dynamics model in compositional manner and improves on previous work by reusing the dynamics model for inference, accelerating and regularizing training. STOVE predicts videos with convincing physical behavior over hundreds of timesteps, outperforms previous unsupervised models, and even approaches the performance of supervised baselines. We further demonstrate the strength of our model as a simulator for sample efficient model-based control in a task with heavily interacting objects.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes