PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation
This addresses the need for more realistic and controllable video generation in computer vision and graphics, though it is incremental as it builds on existing diffusion models and physics simulators.
The paper tackles the problem of video generation models lacking physical plausibility and 3D controllability by introducing PhysCtrl, a framework that generates physics-grounded motion trajectories for image-to-video models, resulting in high-fidelity, controllable videos that outperform existing methods in visual quality and physical plausibility.
Existing video generation models excel at producing photo-realistic videos from text or images, but often lack physical plausibility and 3D controllability. To overcome these limitations, we introduce PhysCtrl, a novel framework for physics-grounded image-to-video generation with physical parameters and force control. At its core is a generative physics network that learns the distribution of physical dynamics across four materials (elastic, sand, plasticine, and rigid) via a diffusion model conditioned on physics parameters and applied forces. We represent physical dynamics as 3D point trajectories and train on a large-scale synthetic dataset of 550K animations generated by physics simulators. We enhance the diffusion model with a novel spatiotemporal attention block that emulates particle interactions and incorporates physics-based constraints during training to enforce physical plausibility. Experiments show that PhysCtrl generates realistic, physics-grounded motion trajectories which, when used to drive image-to-video models, yield high-fidelity, controllable videos that outperform existing methods in both visual quality and physical plausibility. Project Page: https://cwchenwang.github.io/physctrl