AICLCVLGApr 29, 2024

HELPER-X: A Unified Instructable Embodied Agent to Tackle Four Interactive Vision-Language Domains with Memory-Augmented Language Models

arXiv:2404.19065v16 citationsh-index: 28
Originality Incremental advance
AI Analysis

This work addresses the challenge of developing versatile agents for interactive vision-language tasks, but it is incremental as it builds on existing HELPER with memory expansions.

The authors tackled the problem of creating a unified instructable embodied agent that works across multiple interactive vision-language domains by expanding HELPER's memory with more examples and prompts and integrating additional APIs for asking questions, achieving few-shot state-of-the-art performance on four benchmarks without in-domain training.

Recent research on instructable agents has used memory-augmented Large Language Models (LLMs) as task planners, a technique that retrieves language-program examples relevant to the input instruction and uses them as in-context examples in the LLM prompt to improve the performance of the LLM in inferring the correct action and task plans. In this technical report, we extend the capabilities of HELPER, by expanding its memory with a wider array of examples and prompts, and by integrating additional APIs for asking questions. This simple expansion of HELPER into a shared memory enables the agent to work across the domains of executing plans from dialogue, natural language instruction following, active question asking, and commonsense room reorganization. We evaluate the agent on four diverse interactive visual-language embodied agent benchmarks: ALFRED, TEACh, DialFRED, and the Tidy Task. HELPER-X achieves few-shot, state-of-the-art performance across these benchmarks using a single agent, without requiring in-domain training, and remains competitive with agents that have undergone in-domain training.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes