CLAICVLGROApr 28, 2024

LEGENT: Open Platform for Embodied Agents

Tsinghua
arXiv:2404.18243v231 citationsh-index: 41ACL
Originality Incremental advance
AI Analysis

It addresses the problem of limited open-source tools for developing embodied agents, which hinders collective progress in AI for researchers and developers.

The paper tackles the incomplete integration of LLMs and LMMs into embodied agents by introducing LEGENT, an open platform with a 3D environment and data generation pipeline, resulting in a model that surpasses GPT-4V in embodied tasks.

Despite advancements in Large Language Models (LLMs) and Large Multimodal Models (LMMs), their integration into language-grounded, human-like embodied agents remains incomplete, hindering complex real-life task performance in physical environments. Existing integrations often feature limited open sourcing, challenging collective progress in this field. We introduce LEGENT, an open, scalable platform for developing embodied agents using LLMs and LMMs. LEGENT offers a dual approach: a rich, interactive 3D environment with communicable and actionable agents, paired with a user-friendly interface, and a sophisticated data generation pipeline utilizing advanced algorithms to exploit supervision from simulated worlds at scale. In our experiments, an embryonic vision-language-action model trained on LEGENT-generated data surpasses GPT-4V in embodied tasks, showcasing promising generalization capabilities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes