CVLGOct 13, 2016

Predicting the dynamics of 2d objects with a deep residual network

arXiv:1610.04032v23 citations
AI Analysis

This addresses the problem of learning physical dynamics from visual data for robotics or simulation applications, but it is incremental as it builds on existing methods with a simple 2D setup.

The paper tackled the problem of predicting the dynamics of interacting 2D objects using a deep residual network trained as an image-to-image regression task, and the result showed accurate prediction of resulting configurations, implying capabilities like segmentation, inference of grasping points, and handling collisions.

We investigate how a residual network can learn to predict the dynamics of interacting shapes purely as an image-to-image regression task. With a simple 2d physics simulator, we generate short sequences composed of rectangles put in motion by applying a pulling force at a point picked at random. The network is trained with a quadratic loss to predict the image of the resulting configuration, given the image of the starting configuration and an image indicating the point of grasping. Experiments show that the network learns to predict accurately the resulting image, which implies in particular that (1) it segments rectangles as distinct components, (2) it infers which one contains the grasping point, (3) it models properly the dynamic of a single rectangle, including the torque, (4) it detects and handles collisions to some extent, and (5) it re-synthesizes properly the entire scene with displaced rectangles.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes