RoboCraft: Learning to See, Simulate, and Shape Elasto-Plastic Objects with Graph Networks
This work addresses the challenge of manipulating deformable objects for robots in industrial and household tasks, representing an incremental advance by integrating existing methods like GNNs and MPC into a novel framework for this specific domain.
The paper tackled the problem of robotic manipulation of elasto-plastic objects by proposing RoboCraft, a system that uses particle-based representation and graph neural networks to learn dynamics models from visual observations, enabling robots to deform objects into target shapes with just 10 minutes of real-world data and achieving performance comparable to or better than human subjects in tested tasks.
Modeling and manipulating elasto-plastic objects are essential capabilities for robots to perform complex industrial and household interaction tasks (e.g., stuffing dumplings, rolling sushi, and making pottery). However, due to the high degree of freedom of elasto-plastic objects, significant challenges exist in virtually every aspect of the robotic manipulation pipeline, e.g., representing the states, modeling the dynamics, and synthesizing the control signals. We propose to tackle these challenges by employing a particle-based representation for elasto-plastic objects in a model-based planning framework. Our system, RoboCraft, only assumes access to raw RGBD visual observations. It transforms the sensing data into particles and learns a particle-based dynamics model using graph neural networks (GNNs) to capture the structure of the underlying system. The learned model can then be coupled with model-predictive control (MPC) algorithms to plan the robot's behavior. We show through experiments that with just 10 minutes of real-world robotic interaction data, our robot can learn a dynamics model that can be used to synthesize control signals to deform elasto-plastic objects into various target shapes, including shapes that the robot has never encountered before. We perform systematic evaluations in both simulation and the real world to demonstrate the robot's manipulation capabilities and ability to generalize to a more complex action space, different tool shapes, and a mixture of motion modes. We also conduct comparisons between RoboCraft and untrained human subjects controlling the gripper to manipulate deformable objects in both simulation and the real world. Our learned model-based planning framework is comparable to and sometimes better than human subjects on the tested tasks.