Learning to Poke by Poking: Experiential Learning of Intuitive Physics
This work addresses the challenge of robotic manipulation by learning internal models from direct interaction, which is incremental as it builds on existing deep learning and model-based approaches.
The paper tackles the problem of enabling robots to learn intuitive physics through experiential learning, specifically by poking objects to displace them to target locations, achieving superior performance over alternative methods with over 400 hours and 100K pokes of real-world robotic experience.
We investigate an experiential learning paradigm for acquiring an internal model of intuitive physics. Our model is evaluated on a real-world robotic manipulation task that requires displacing objects to target locations by poking. The robot gathered over 400 hours of experience by executing more than 100K pokes on different objects. We propose a novel approach based on deep neural networks for modeling the dynamics of robot's interactions directly from images, by jointly estimating forward and inverse models of dynamics. The inverse model objective provides supervision to construct informative visual features, which the forward model can then predict and in turn regularize the feature space for the inverse model. The interplay between these two objectives creates useful, accurate models that can then be used for multi-step decision making. This formulation has the additional benefit that it is possible to learn forward models in an abstract feature space and thus alleviate the need of predicting pixels. Our experiments show that this joint modeling approach outperforms alternative methods.