ROAIMar 11, 2025

EMMOE: A Comprehensive Benchmark for Embodied Mobile Manipulation in Open Environments

arXiv:2503.08604v23 citationsh-index: 26
Originality Synthesis-oriented
AI Analysis

This addresses the problem of fragmented evaluation for home robots controlled by natural language, though it appears incremental as it builds on existing LLM and embodied intelligence advancements.

The authors tackled the lack of a unified benchmark for complex robot tasks by proposing EMMOE, a benchmark for embodied mobile manipulation in open environments, and demonstrated their agent system's performance with evaluations of different models and policies.

Developing autonomous home robots controlled by natural language has long been a pursuit of humanity. While advancements in large language models (LLMs) and embodied intelligence make this goal closer, several challenges persist: the lack of a unified benchmark for more complex robot tasks, limited evaluation methods and metrics, data incompatibility between LLMs and mobile manipulation trajectories. To address these issues, we propose Embodied Mobile Manipulation in Open Environments (EMMOE), a benchmark that requires agents to interpret user instructions and execute long-horizon everyday tasks in continuous space. EMMOE seamlessly integrates high-level and low-level embodied tasks into a unified framework, along with three new metrics for more diverse assessment. Additionally, we collect~\dataset, which features in various task attributes, detailed process annotations, re-plans after failures, and two sub-datasets for LLM training. Furthermore, we design~\model, a sophisticated agent system consists of LLM with Direct Preference Optimization (DPO), light weighted navigation and manipulation models, and multiple error detection mechanisms. Finally, we demonstrate~\model's performance and evaluations of different models and policies.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes