EMPOWER: Embodied Multi-role Open-vocabulary Planning with Online Grounding and Execution
This addresses challenges in robotic task planning for real-world applications, though it appears incremental as it builds on existing foundation models.
The paper tackles the problem of task planning for robots in real-life settings by introducing EMPOWER, a framework for open-vocabulary online grounding and planning, which achieved an average success rate of 0.73 across six scenarios using a TIAGo robot.
Task planning for robots in real-life settings presents significant challenges. These challenges stem from three primary issues: the difficulty in identifying grounded sequences of steps to achieve a goal; the lack of a standardized mapping between high-level actions and low-level commands; and the challenge of maintaining low computational overhead given the limited resources of robotic hardware. We introduce EMPOWER, a framework designed for open-vocabulary online grounding and planning for embodied agents aimed at addressing these issues. By leveraging efficient pre-trained foundation models and a multi-role mechanism, EMPOWER demonstrates notable improvements in grounded planning and execution. Quantitative results highlight the effectiveness of our approach, achieving an average success rate of 0.73 across six different real-life scenarios using a TIAGo robot.