CVNov 12, 2015

Newtonian Image Understanding: Unfolding the Dynamics of Objects in Static Images

Roozbeh Mottaghi, Hessam Bagherinezhad, Mohammad Rastegari, Ali Farhadi

arXiv:1511.04048v120.9155 citations

Originality Highly original

AI Analysis

This work addresses the challenge of physical understanding in computer vision, offering a novel approach for predicting object dynamics from single images, which could benefit robotics and AI systems.

The paper tackles the problem of predicting object dynamics from static images by introducing Newtonian scenarios and a neural network to map images to these scenarios, achieving reliable predictions and providing physical reasoning with velocity and force vectors.

In this paper, we study the challenging problem of predicting the dynamics of objects in static images. Given a query object in an image, our goal is to provide a physical understanding of the object in terms of the forces acting upon it and its long term motion as response to those forces. Direct and explicit estimation of the forces and the motion of objects from a single image is extremely challenging. We define intermediate physical abstractions called Newtonian scenarios and introduce Newtonian Neural Network ($N^3$) that learns to map a single image to a state in a Newtonian scenario. Our experimental evaluations show that our method can reliably predict dynamics of a query object from a single image. In addition, our approach can provide physical reasoning that supports the predicted dynamics in terms of velocity and force vectors. To spur research in this direction we compiled Visual Newtonian Dynamics (VIND) dataset that includes 6806 videos aligned with Newtonian scenarios represented using game engines, and 4516 still images with their ground truth dynamics.

View on arXiv PDF

Similar