AIMay 24, 2024

Luban: Building Open-Ended Creative Agents via Autonomous Embodied Verification

arXiv:2405.15414v17 citationsh-index: 26
Originality Incremental advance
AI Analysis

This work addresses the challenge of building open-ended creative agents for AI research, representing an incremental advancement by applying novel verification methods to a specific domain.

The paper tackles the problem of enabling AI agents to perform creative tasks with open goals and abstract criteria in Minecraft by introducing autonomous embodied verification techniques, resulting in the Luban agent outperforming baselines by 33% to 100% in visualization and pragmatism on a proposed benchmark.

Building open agents has always been the ultimate goal in AI research, and creative agents are the more enticing. Existing LLM agents excel at long-horizon tasks with well-defined goals (e.g., `mine diamonds' in Minecraft). However, they encounter difficulties on creative tasks with open goals and abstract criteria due to the inability to bridge the gap between them, thus lacking feedback for self-improvement in solving the task. In this work, we introduce autonomous embodied verification techniques for agents to fill the gap, laying the groundwork for creative tasks. Specifically, we propose the Luban agent target creative building tasks in Minecraft, which equips with two-level autonomous embodied verification inspired by human design practices: (1) visual verification of 3D structural speculates, which comes from agent synthesized CAD modeling programs; (2) pragmatic verification of the creation by generating and verifying environment-relevant functionality programs based on the abstract criteria. Extensive multi-dimensional human studies and Elo ratings show that the Luban completes diverse creative building tasks in our proposed benchmark and outperforms other baselines ($33\%$ to $100\%$) in both visualization and pragmatism. Additional demos on the real-world robotic arm show the creation potential of the Luban in the physical world.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes