OpenD: A Benchmark for Language-Driven Door and Drawer Opening
This work addresses the challenge of integrating language understanding with spatial reasoning and long-term manipulation for robotics, though it is incremental as it builds on existing methods in simulation.
The authors tackled the problem of language-driven robotic manipulation by introducing OPEND, a benchmark for opening doors and drawers in simulation, and proposed a multi-step planner that achieved zero-shot performance on test data.
We introduce OPEND, a benchmark for learning how to use a hand to open cabinet doors or drawers in a photo-realistic and physics-reliable simulation environment driven by language instruction. To solve the task, we propose a multi-step planner composed of a deep neural network and rule-base controllers. The network is utilized to capture spatial relationships from images and understand semantic meaning from language instructions. Controllers efficiently execute the plan based on the spatial and semantic understanding. We evaluate our system by measuring its zero-shot performance in test data set. Experimental results demonstrate the effectiveness of decision planning by our multi-step planner for different hands, while suggesting that there is significant room for developing better models to address the challenge brought by language understanding, spatial reasoning, and long-term manipulation. We will release OPEND and host challenges to promote future research in this area.