AIRODec 5, 2024

TANGO: Training-free Embodied AI Agents for Open-world Tasks

arXiv:2412.10402v123 citationsh-index: 4CVPR
Originality Incremental advance
AI Analysis

This addresses the challenge of creating flexible, training-free embodied agents for robotics and AI applications, though it builds incrementally on existing LLM program composition methods.

The paper tackles the problem of enabling embodied AI agents to perform diverse open-world tasks without additional training by extending LLM-based program composition to embodied settings. The result is a single model that achieves state-of-the-art performance on three Embodied AI tasks in zero-shot scenarios.

Large Language Models (LLMs) have demonstrated excellent capabilities in composing various modules together to create programs that can perform complex reasoning tasks on images. In this paper, we propose TANGO, an approach that extends the program composition via LLMs already observed for images, aiming to integrate those capabilities into embodied agents capable of observing and acting in the world. Specifically, by employing a simple PointGoal Navigation model combined with a memory-based exploration policy as a foundational primitive for guiding an agent through the world, we show how a single model can address diverse tasks without additional training. We task an LLM with composing the provided primitives to solve a specific task, using only a few in-context examples in the prompt. We evaluate our approach on three key Embodied AI tasks: Open-Set ObjectGoal Navigation, Multi-Modal Lifelong Navigation, and Open Embodied Question Answering, achieving state-of-the-art results without any specific fine-tuning in challenging zero-shot scenarios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes