MOSIV: Multi-Object System Identification from Videos
This work addresses the challenge of identifying material parameters in complex, multi-object interactions from video data, which is incremental as it builds on existing simulation and identification methods but extends them to more realistic scenarios.
The paper tackles the problem of multi-object system identification from videos, where prior methods are limited to single-object scenes or discrete material classification, and introduces MOSIV, a framework that optimizes continuous per-object material parameters using a differentiable simulator with geometric objectives, achieving substantial improvements in grounding accuracy and long-horizon simulation fidelity on a new synthetic benchmark.
We introduce the challenging problem of multi-object system identification from videos, for which prior methods are ill-suited due to their focus on single-object scenes or discrete material classification with a fixed set of material prototypes. To address this, we propose MOSIV, a new framework that directly optimizes for continuous, per-object material parameters using a differentiable simulator guided by geometric objectives derived from video. We also present a new synthetic benchmark with contact-rich, multi-object interactions to facilitate evaluation. On this benchmark, MOSIV substantially improves grounding accuracy and long-horizon simulation fidelity over adapted baselines, establishing it as a strong baseline for this new task. Our analysis shows that object-level fine-grained supervision and geometry-aligned objectives are critical for stable optimization in these complex, multi-object settings. The source code and dataset will be released.