Closed Loop Interactive Embodied Reasoning for Robot Manipulation
This work addresses the challenge of robust robotic manipulation in dynamic environments, representing an incremental improvement in embodied reasoning systems.
The paper tackles the problem of enabling robots to perform complex manipulation tasks through embodied reasoning by introducing a closed-loop interactive approach that accounts for non-visual properties and environmental changes, achieving success rates above 76% in simulation and 64% in real-world tasks.
Embodied reasoning systems integrate robotic hardware and cognitive processes to perform complex tasks, typically in response to a natural language query about a specific physical environment. This usually involves changing the belief about the scene or physically interacting and changing the scene (e.g. sort the objects from lightest to heaviest). In order to facilitate the development of such systems we introduce a new modular Closed Loop Interactive Embodied Reasoning (CLIER) approach that takes into account the measurements of non-visual object properties, changes in the scene caused by external disturbances as well as uncertain outcomes of robotic actions. CLIER performs multi-modal reasoning and action planning and generates a sequence of primitive actions that can be executed by a robot manipulator. Our method operates in a closed loop, responding to changes in the environment. Our approach is developed with the use of MuBle simulation environment and tested in 10 interactive benchmark scenarios. We extensively evaluate our reasoning approach in simulation and in real-world manipulation tasks with a success rate above 76% and 64%, respectively.