h-index28
7papers
57citations
Novelty42%
AI Score55

7 Papers

ROMay 18
WorldArena 2.0: Extending Embodied World Model Benchmarking on Modality, Functionality and Platform

Yu Shang, Yinzhou Tang, Yiding Ma et al.

World models have emerged as a central paradigm for embodied intelligence, enabling agents to predict action-conditioned future and reason about environmental dynamics. However, existing embodied world model benchmarks are still largely confined to vision-only prediction, offline embodied applications, and simulator-based evaluation, making them insufficient for assessing increasingly comprehensive world models. In this work, we introduce WorldArena 2.0, an expanded benchmark that systematically broadens embodied world model evaluation along three dimensions: modality, functionality, and platform. Along the modality dimension, WorldArena 2.0 extends evaluation from vision-only to visuotactile modalities, enabling assessment of multimodal perception and prediction. Along the functionality dimension, it extends beyond policy evaluation and planning to assess world models as interactive RL environments for policy optimization. Along the platform dimension, it moves beyond simulator-only evaluation to a diverse suite of simulated and real-world robotic settings across multiple embodiments. Under a standardized protocol, WorldArena 2.0 comprehensively evaluates perceptual quality, interactive utility, and cross-platform performance, providing a comprehensive testbed for tracking progress toward embodied world models. The benchmark is available at: https://world-arena.ai.

CVJun 29, 2025Code
RoboScape: Physics-informed Embodied World Model

Yu Shang, Xin Zhang, Yinzhou Tang et al.

World models have become indispensable tools for embodied intelligence, serving as powerful simulators capable of generating realistic robotic videos while addressing critical data scarcity challenges. However, current embodied world models exhibit limited physical awareness, particularly in modeling 3D geometry and motion dynamics, resulting in unrealistic video generation for contact-rich robotic scenarios. In this paper, we present RoboScape, a unified physics-informed world model that jointly learns RGB video generation and physics knowledge within an integrated framework. We introduce two key physics-informed joint training tasks: temporal depth prediction that enhances 3D geometric consistency in video rendering, and keypoint dynamics learning that implicitly encodes physical properties (e.g., object shape and material characteristics) while improving complex motion modeling. Extensive experiments demonstrate that RoboScape generates videos with superior visual fidelity and physical plausibility across diverse robotic scenarios. We further validate its practical utility through downstream applications including robotic policy training with generated data and policy evaluation. Our work provides new insights for building efficient physics-informed world models to advance embodied intelligence research. The code is available at: https://github.com/tsinghua-fib-lab/RoboScape.

CVFeb 9
WorldArena: A Unified Benchmark for Evaluating Perception and Functional Utility of Embodied World Models

Yu Shang, Zhuohang Li, Yiding Ma et al.

While world models have emerged as a cornerstone of embodied intelligence by enabling agents to reason about environmental dynamics through action-conditioned prediction, their evaluation remains fragmented. Current evaluation of embodied world models has largely focused on perceptual fidelity (e.g., video generation quality), overlooking the functional utility of these models in downstream decision-making tasks. In this work, we introduce WorldArena, a unified benchmark designed to systematically evaluate embodied world models across both perceptual and functional dimensions. WorldArena assesses models through three dimensions: video perception quality, measured with 16 metrics across six sub-dimensions; embodied task functionality, which evaluates world models as data engines, policy evaluators, and action planners integrating with subjective human evaluation. Furthermore, we propose EWMScore, a holistic metric integrating multi-dimensional performance into a single interpretable index. Through extensive experiments on 14 representative models, we reveal a significant perception-functionality gap, showing that high visual quality does not necessarily translate into strong embodied task capability. WorldArena benchmark with the public leaderboard is released at https://worldarena.ai, providing a framework for tracking progress toward truly functional world models in embodied AI.

SIFeb 26, 2025Code
Predicting Cascade Failures in Interdependent Urban Infrastructure Networks

Yinzhou Tang, Jinghua Piao, Huandong Wang et al.

Cascading failures (CF) entail component breakdowns spreading through infrastructure networks, causing system-wide collapse. Predicting CFs is of great importance for infrastructure stability and urban function. Despite extensive research on CFs in single networks such as electricity and road networks, interdependencies among diverse infrastructures remain overlooked, and capturing intra-infrastructure CF dynamics amid complex evolutions poses challenges. To address these gaps, we introduce the \textbf{I}ntegrated \textbf{I}nterdependent \textbf{I}nfrastructure CF model ($I^3$), designed to capture CF dynamics both within and across infrastructures. $I^3$ employs a dual GAE with global pooling for intra-infrastructure dynamics and a heterogeneous graph for inter-infrastructure interactions. An initial node enhancement pre-training strategy mitigates GCN-induced over-smoothing. Experiments demonstrate $I^3$ achieves a 31.94\% in terms of AUC, 18.03\% in terms of Precision, 29.17\% in terms of Recall, 22.73\% in terms of F1-score boost in predicting infrastructure failures, and a 28.52\% reduction in terms of RMSE for cascade volume forecasts compared to leading models. It accurately pinpoints phase transitions in interconnected and singular networks, rectifying biases in models tailored for singular networks. Access the code at https://github.com/tsinghua-fib-lab/Icube.

RODec 3, 2025
RoboScape-R: Unified Reward-Observation World Models for Generalizable Robotics Training via RL

Yinzhou Tang, Yu Shang, Yinuo Chen et al.

Achieving generalizable embodied policies remains a key challenge. Traditional policy learning paradigms, including both Imitation Learning (IL) and Reinforcement Learning (RL), struggle to cultivate generalizability across diverse scenarios. While IL policies often overfit to specific expert trajectories, RL suffers from the inherent lack of a unified and general reward signal necessary for effective multi-scene generalization. We posit that the world model is uniquely capable of serving as a universal environment proxy to address this limitation. However, current world models primarily focus on their ability to predict observations and still rely on task-specific, handcrafted reward functions, thereby failing to provide a truly general training environment. Toward this problem, we propose RoboScape-R, a framework leveraging the world model to serve as a versatile, general-purpose proxy for the embodied environment within the RL paradigm. We introduce a novel world model-based general reward mechanism that generates ''endogenous'' rewards derived from the model's intrinsic understanding of real-world state transition dynamics. Extensive experiments demonstrate that RoboScape-R effectively addresses the limitations of traditional RL methods by providing an efficient and general training environment that substantially enhances the generalization capability of embodied policies. Our approach offers critical insights into utilizing the world model as an online training strategy and achieves an average 37.5% performance improvement over baselines under out-of-domain scenarios.

LGJul 26, 2025Code
Predicting Human Mobility in Disasters via LLM-Enhanced Cross-City Learning

Yinzhou Tang, Huandong Wang, Xiaochen Fan et al.

The vulnerability of cities to natural disasters has increased with urbanization and climate change, making it more important to predict human mobility in the disaster scenarios for downstream tasks including location-based early disaster warning and pre-allocating rescue resources, etc. However, existing human mobility prediction models are mainly designed for normal scenarios, and fail to adapt to disaster scenarios due to the shift of human mobility patterns under disaster. To address this issue, we introduce \textbf{DisasterMobLLM}, a mobility prediction framework for disaster scenarios that can be integrated into existing deep mobility prediction methods by leveraging LLMs to model the mobility intention and transferring the common knowledge of how different disasters affect mobility intentions between cities. This framework utilizes a RAG-Enhanced Intention Predictor to forecast the next intention, refines it with an LLM-based Intention Refiner, and then maps the intention to an exact location using an Intention-Modulated Location Predictor. Extensive experiments illustrate that DisasterMobLLM can achieve a 32.8\% improvement in terms of Acc@1 and a 35.0\% improvement in terms of the F1-score of predicting immobility compared to the baselines. The code is available at https://github.com/tsinghua-fib-lab/DisasterMobLLM.

SOC-PHJun 9, 2025
A Survey of Physics-Informed AI for Complex Urban Systems

En Xu, Huandong Wang, Yunke Zhang et al.

Urban systems are typical examples of complex systems, where the integration of physics-based modeling with artificial intelligence (AI) presents a promising paradigm for enhancing predictive accuracy, interpretability, and decision-making. In this context, AI excels at capturing complex, nonlinear relationships, while physics-based models ensure consistency with real-world laws and provide interpretable insights. We provide a comprehensive review of physics-informed AI methods in urban applications. The proposed taxonomy categorizes existing approaches into three paradigms - Physics-Integrated AI, Physics-AI Hybrid Ensemble, and AI-Integrated Physics - and further details seven representative methods. This classification clarifies the varying degrees and directions of physics-AI integration, guiding the selection and development of appropriate methods based on application needs and data availability. We systematically examine their applications across eight key urban domains: energy, environment, economy, transportation, information, public services, emergency management, and the urban system as a whole. Our analysis highlights how these methodologies leverage physical laws and data-driven models to address urban challenges, enhancing system reliability, efficiency, and adaptability. By synthesizing existing methodologies and their urban applications, we identify critical gaps and outline future research directions, paving the way toward next-generation intelligent urban system modeling.