Structured agents for physical construction
This work addresses the challenge of developing AI agents capable of intuitive physics and planning for physical construction, which is incremental as it builds on existing deep reinforcement learning methods with structured enhancements.
The paper tackled the problem of physical construction tasks using deep reinforcement learning agents, finding that agents with structured representations and policies outperformed less structured ones and generalized better to larger scenes, with model-based agents using Monte-Carlo Tree Search excelling in the most challenging problems.
Physical construction---the ability to compose objects, subject to physical dynamics, to serve some function---is fundamental to human intelligence. We introduce a suite of challenging physical construction tasks inspired by how children play with blocks, such as matching a target configuration, stacking blocks to connect objects together, and creating shelter-like structures over target objects. We examine how a range of deep reinforcement learning agents fare on these challenges, and introduce several new approaches which provide superior performance. Our results show that agents which use structured representations (e.g., objects and scene graphs) and structured policies (e.g., object-centric actions) outperform those which use less structured representations, and generalize better beyond their training when asked to reason about larger scenes. Model-based agents which use Monte-Carlo Tree Search also outperform strictly model-free agents in our most challenging construction problems. We conclude that approaches which combine structured representations and reasoning with powerful learning are a key path toward agents that possess rich intuitive physics, scene understanding, and planning.