ROJul 15, 2019

Vid2Param: Modelling of Dynamics Parameters from Video

arXiv:1907.06422v35 citations
Originality Incremental advance
AI Analysis

This work addresses the need for robust autonomous agents, such as robots in dynamic environments, to perform online physical reasoning for tasks like prediction and control, though it is incremental as it builds on existing methods like recurrent variational autoencoders with domain randomization.

The paper tackles the problem of extracting dynamical parameters like position, velocity, and restitution directly from video, which is challenging due to complex interactions and rich sensory data, and demonstrates that their Vid2Param model can perform online system identification and probabilistic forward predictions, enabling a robot to intercept a bouncing ball with occluded vision by estimating its physical parameters from video.

Videos provide a rich source of information, but it is generally hard to extract dynamical parameters of interest. Inferring those parameters from a video stream would be beneficial for physical reasoning. Robots performing tasks in dynamic environments would benefit greatly from understanding the underlying environment motion, in order to make future predictions and to synthesize effective control policies that use this inductive bias. Online physical reasoning is therefore a fundamental requirement for robust autonomous agents. When the dynamics involves multiple modes (due to contacts or interactions between objects) and sensing must proceed directly from a rich sensory stream such as video, then traditional methods for system identification may not be well suited. We propose an approach wherein fast parameter estimation can be achieved directly from video. We integrate a physically based dynamics model with a recurrent variational autoencoder, by introducing an additional loss to enforce desired constraints. The model, which we call Vid2Param, can be trained entirely in simulation, in an end-to-end manner with domain randomization, to perform online system identification, and make probabilistic forward predictions of parameters of interest. This enables the resulting model to encode parameters such as position, velocity, restitution, air drag and other physical properties of the system. We illustrate the utility of this in physical experiments wherein a PR2 robot with a velocity constrained arm must intercept an unknown bouncing ball with partly occluded vision, by estimating the physical parameters of this ball directly from the video trace after the ball is released.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes