Scan, Materialize, Simulate: A Generalizable Framework for Physically Grounded Robot Planning
This addresses the challenge of generalizable physical reasoning for autonomous robots in real-world settings, though it appears incremental as it combines existing components into a unified framework.
The paper tackles the problem of enabling robots to reason about physical consequences in unstructured environments by introducing the Scan, Materialize, Simulate (SMS) framework, which integrates 3D Gaussian Splatting, visual foundation models, and physics simulation, achieving robust performance in billiards manipulation and quadrotor landing tasks with simulated domain transfer and real-world validation.
Autonomous robots must reason about the physical consequences of their actions to operate effectively in unstructured, real-world environments. We present Scan, Materialize, Simulate (SMS), a unified framework that combines 3D Gaussian Splatting for accurate scene reconstruction, visual foundation models for semantic segmentation, vision-language models for material property inference, and physics simulation for reliable prediction of action outcomes. By integrating these components, SMS enables generalizable physical reasoning and object-centric planning without the need to re-learn foundational physical dynamics. We empirically validate SMS in a billiards-inspired manipulation task and a challenging quadrotor landing scenario, demonstrating robust performance on both simulated domain transfer and real-world experiments. Our results highlight the potential of bridging differentiable rendering for scene reconstruction, foundation models for semantic understanding, and physics-based simulation to achieve physically grounded robot planning across diverse settings.