CVAIApr 1

EgoSim: Egocentric World Simulator for Embodied Interaction Generation

arXiv:2604.0100198.82 citations
Predicted impact top 2% in CV · last 90 daysOriginality Highly original
AI Analysis

This addresses the challenge of realistic embodied interaction simulation for robotics and AI applications, though it is incremental by building on prior egocentric simulators.

The paper tackles the problem of generating spatially consistent and persistent egocentric interaction videos by introducing EgoSim, a closed-loop simulator that updates 3D scene states, resulting in significant outperformance over existing methods in visual quality, spatial consistency, and generalization to complex scenes and dexterous interactions.

We introduce EgoSim, a closed-loop egocentric world simulator that generates spatially consistent interaction videos and persistently updates the underlying 3D scene state for continuous simulation. Existing egocentric simulators either lack explicit 3D grounding, causing structural drift under viewpoint changes, or treat the scene as static, failing to update world states across multi-stage interactions. EgoSim addresses both limitations by modeling 3D scenes as updatable world states. We generate embodiment interactions via a Geometry-action-aware Observation Simulation model, with spatial consistency from an Interaction-aware State Updating module. To overcome the critical data bottleneck posed by the difficulty in acquiring densely aligned scene-interaction training pairs, we design a scalable pipeline that extracts static point clouds, camera trajectories, and embodiment actions from in-the-wild large-scale monocular egocentric videos. We further introduce EgoCap, a capture system that enables low-cost real-world data collection with uncalibrated smartphones. Extensive experiments demonstrate that EgoSim significantly outperforms existing methods in terms of visual quality, spatial consistency, and generalization to complex scenes and in-the-wild dexterous interactions, while supporting cross-embodiment transfer to robotic manipulation. Codes and datasets will be open soon. The project page is at egosimulator.github.io.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes