CVAILGROMay 5, 2025

MetaScenes: Towards Automated Replica Creation for Real-world 3D Scans

arXiv:2505.02388v125 citationsh-index: 25CVPR
Originality Incremental advance
AI Analysis

This addresses the scalability and effort issues in 3D scene creation for Embodied AI research, offering a more automated approach.

The paper tackles the challenge of creating high-quality, diverse 3D scenes for Embodied AI by introducing MetaScenes, a large-scale dataset with 15366 objects across 831 categories, and Scan2Sim, a model for automated asset replacement, which reduces reliance on manual design and supports tasks like robotic manipulation and vision-and-language navigation.

Embodied AI (EAI) research requires high-quality, diverse 3D scenes to effectively support skill acquisition, sim-to-real transfer, and generalization. Achieving these quality standards, however, necessitates the precise replication of real-world object diversity. Existing datasets demonstrate that this process heavily relies on artist-driven designs, which demand substantial human effort and present significant scalability challenges. To scalably produce realistic and interactive 3D scenes, we first present MetaScenes, a large-scale, simulatable 3D scene dataset constructed from real-world scans, which includes 15366 objects spanning 831 fine-grained categories. Then, we introduce Scan2Sim, a robust multi-modal alignment model, which enables the automated, high-quality replacement of assets, thereby eliminating the reliance on artist-driven designs for scaling 3D scenes. We further propose two benchmarks to evaluate MetaScenes: a detailed scene synthesis task focused on small item layouts for robotic manipulation and a domain transfer task in vision-and-language navigation (VLN) to validate cross-domain transfer. Results confirm MetaScene's potential to enhance EAI by supporting more generalizable agent learning and sim-to-real applications, introducing new possibilities for EAI research. Project website: https://meta-scenes.github.io/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes