ROCLCVNov 20, 2025

MiMo-Embodied: X-Embodied Foundation Model Technical Report

arXiv:2511.16518v118 citationsh-index: 5Has Code
Originality Highly original
AI Analysis

This work addresses the need for integrated AI models in robotics and autonomous systems, representing a novel advancement rather than an incremental improvement.

The authors tackled the problem of creating a cross-embodied foundation model for Autonomous Driving and Embodied AI, achieving state-of-the-art performance by setting new records across 29 benchmarks and significantly outperforming existing baselines.

We open-source MiMo-Embodied, the first cross-embodied foundation model to successfully integrate and achieve state-of-the-art performance in both Autonomous Driving and Embodied AI. MiMo-Embodied sets new records across 17 embodied AI benchmarks in Task Planning, Affordance Prediction and Spatial Understanding, while also excelling in 12 autonomous driving benchmarks across Environmental Perception, Status Prediction, and Driving Planning. Across these tasks, MiMo-Embodied significantly outperforms existing open-source, closed-source, and specialized baselines. Our results indicate that through multi-stage learning, curated data construction, and CoT/RL fine-tuning, these two domains exhibit strong positive transfer and mutually reinforce one another. We provide a detailed analysis of our model design and training methodologies to facilitate further research. Code and models are available at https://github.com/XiaomiMiMo/MiMo-Embodied.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes