ROAIDec 23, 2024

Multi-Modal Grounded Planning and Efficient Replanning For Learning Embodied Agents with A Few Examples

arXiv:2412.17288v110 citationsh-index: 8Has CodeAAAI
Originality Incremental advance
AI Analysis

This addresses the challenge of reducing annotation costs for robotic assistants in complex tasks, though it is incremental by building on LLM-based planning.

The paper tackles the problem of generating environment-grounded plans for robotic agents from natural language instructions with minimal data, proposing FLARE which uses both language and visual perception to correct ambiguities and outperforms state-of-the-art methods.

Learning a perception and reasoning module for robotic assistants to plan steps to perform complex tasks based on natural language instructions often requires large free-form language annotations, especially for short high-level instructions. To reduce the cost of annotation, large language models (LLMs) are used as a planner with few data. However, when elaborating the steps, even the state-of-the-art planner that uses LLMs mostly relies on linguistic common sense, often neglecting the status of the environment at command reception, resulting in inappropriate plans. To generate plans grounded in the environment, we propose FLARE (Few-shot Language with environmental Adaptive Replanning Embodied agent), which improves task planning using both language command and environmental perception. As language instructions often contain ambiguities or incorrect expressions, we additionally propose to correct the mistakes using visual cues from the agent. The proposed scheme allows us to use a few language pairs thanks to the visual cues and outperforms state-of-the-art approaches. Our code is available at https://github.com/snumprlab/flare.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes