CVROJul 23, 2025

From Scan to Action: Leveraging Realistic Scans for Embodied Scene Understanding

arXiv:2507.17585v11 citationsh-index: 30
Originality Incremental advance
AI Analysis

This addresses data usability challenges for researchers and practitioners in embodied AI and robotics, though it is incremental in improving existing scan processing methods.

The paper tackled the problem of leveraging realistic 3D scene scans for downstream applications by proposing a unified annotation integration method using USD, which achieved an 80% success rate in LLM-based scene editing and 87% in robotic simulation policy learning.

Real-world 3D scene-level scans offer realism and can enable better real-world generalizability for downstream applications. However, challenges such as data volume, diverse annotation formats, and tool compatibility limit their use. This paper demonstrates a methodology to effectively leverage these scans and their annotations. We propose a unified annotation integration using USD, with application-specific USD flavors. We identify challenges in utilizing holistic real-world scan datasets and present mitigation strategies. The efficacy of our approach is demonstrated through two downstream applications: LLM-based scene editing, enabling effective LLM understanding and adaptation of the data (80% success), and robotic simulation, achieving an 87% success rate in policy learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes