MatPhys: Learning Material-Aware Physics Parameters for Deformable Object Simulation from Videos
For vision, graphics, and robotics, MatPhys addresses the limitations of homogeneous material assumptions and inconsistent parameter estimation across scenes, enabling more robust and generalizable physical digital twin reconstruction.
MatPhys predicts spring-mass parameters for deformable objects from single-view videos, achieving reconstruction and future prediction comparable to per-scene optimization while generalizing better to unseen interactions and objects with more consistent physical parameters.
Reconstructing simulation-ready deformable objects is important for vision, graphics, and robotics. Existing physics-driven methods can recover physical digital twins from videos, but they suffer from two fundamental limitations: they typically assume a homogeneous material across the whole object, and their scene-specific inverse optimization, combined with the inherent ambiguity of monocular observation, yields inconsistent parameters for the same material across different scenes or interactions. We propose MatPhys, a material-aware feed-forward framework that predicts spring-mass parameters from a single-view video, addressing these two issues with two coupled designs. To relax the homogeneous material assumption, we use DINO features to decompose the object into semantically meaningful parts and to query a part-level material prior, assigning each part its own physical behavior. To enforce cross-scene consistency, we introduce a learned material codebook of shared material embeddings as the bridge between appearance and physics, and further use the part-level prior as a reference distribution that constrains the decoder so that the same material yields consistent parameters across scenes and interactions. Together, these designs turn an under-constrained monocular problem into feed-forward inference grounded on shared, reusable material concepts. Experiments show that our method matches per-scene optimization baselines in reconstruction and future prediction, while achieving stronger generalization to unseen interactions and objects with more consistent physical parameters.