MotionAnymesh: Physics-Grounded Articulation for Simulation-Ready Digital Twins
This solves the issue of creating simulation-ready digital twins for embodied AI and robotic simulation, though it appears incremental as it builds on existing zero-shot pipelines with physical grounding improvements.
The paper tackles the problem of converting static 3D meshes into articulated assets for AI and robotics by addressing kinematic hallucinations and mesh inter-penetration, resulting in MotionAnymesh, which outperforms state-of-the-art baselines in geometric precision and dynamic physical executability.
Converting static 3D meshes into interactable articulated assets is crucial for embodied AI and robotic simulation. However, existing zero-shot pipelines struggle with complex assets due to a critical lack of physical grounding. Specifically, ungrounded Vision-Language Models (VLMs) frequently suffer from kinematic hallucinations, while unconstrained joint estimation inevitably leads to catastrophic mesh inter-penetration during physical simulation. To bridge this gap, we propose MotionAnymesh, an automated zero-shot framework that seamlessly transforms unstructured static meshes into simulation-ready digital twins. Our method features a kinematic-aware part segmentation module that grounds VLM reasoning with explicit SP4D physical priors, effectively eradicating kinematic hallucinations. Furthermore, we introduce a geometry-physics joint estimation pipeline that combines robust type-aware initialization with physics-constrained trajectory optimization to rigorously guarantee collision-free articulation. Extensive experiments demonstrate that MotionAnymesh significantly outperforms state-of-the-art baselines in both geometric precision and dynamic physical executability, providing highly reliable assets for downstream applications.