Translating Natural Language Instructions to Computer Programs for Robot Manipulation
This work is significant for improving human-robot interaction by enabling robots to understand natural language instructions, particularly for robot manipulation tasks.
This paper addresses the challenge of robots understanding natural language instructions by translating them into Python functions. These functions query scene information from an object detector and control the robot, outperforming direct neural network prediction of robot actions.
It is highly desirable for robots that work alongside humans to be able to understand instructions in natural language. Existing language conditioned imitation learning models directly predict the actuator commands from the image observation and the instruction text. Rather than directly predicting actuator commands, we propose translating the natural language instruction to a Python function which queries the scene by accessing the output of the object detector and controls the robot to perform the specified task. This enables the use of non-differentiable modules such as a constraint solver when computing commands to the robot. Moreover, the labels in this setup are significantly more informative computer programs that capture the intent of the expert rather than teleoperated demonstrations. We show that the proposed method performs better than training a neural network to directly predict the robot actions.