RO CVMay 18

WorldArena 2.0: Extending Embodied World Model Benchmarking on Modality, Functionality and Platform

Yu Shang, Yinzhou Tang, Yiding Ma, Zhuohang Li, Lei Jin, Weikang Su, Xin Jin, Zhaolu Wang, Ziyou Wang, Xin Zhang, Haisheng Su, Weizhen He

arXiv:2605.1791272.6

Predicted impact top 1% in RO · last 90 daysOriginality Incremental advance

AI Analysis

This benchmark addresses the need for comprehensive evaluation of embodied world models, which is crucial for researchers developing multimodal, interactive, and real-world-capable AI agents.

WorldArena 2.0 extends embodied world model benchmarking across three dimensions: modality (vision to visuotactile), functionality (policy evaluation to interactive RL environments), and platform (simulators to real-world robots). It provides a standardized protocol for evaluating perceptual quality, interactive utility, and cross-platform performance.

World models have emerged as a central paradigm for embodied intelligence, enabling agents to predict action-conditioned future and reason about environmental dynamics. However, existing embodied world model benchmarks are still largely confined to vision-only prediction, offline embodied applications, and simulator-based evaluation, making them insufficient for assessing increasingly comprehensive world models. In this work, we introduce WorldArena 2.0, an expanded benchmark that systematically broadens embodied world model evaluation along three dimensions: modality, functionality, and platform. Along the modality dimension, WorldArena 2.0 extends evaluation from vision-only to visuotactile modalities, enabling assessment of multimodal perception and prediction. Along the functionality dimension, it extends beyond policy evaluation and planning to assess world models as interactive RL environments for policy optimization. Along the platform dimension, it moves beyond simulator-only evaluation to a diverse suite of simulated and real-world robotic settings across multiple embodiments. Under a standardized protocol, WorldArena 2.0 comprehensively evaluates perceptual quality, interactive utility, and cross-platform performance, providing a comprehensive testbed for tracking progress toward embodied world models. The benchmark is available at: https://world-arena.ai.

View on arXiv PDF

Similar