CVMay 27, 2025

Think Before You Diffuse: Infusing Physical Rules into Video Diffusion

arXiv:2505.21653v37 citationsh-index: 13
Originality Incremental advance
AI Analysis

This addresses the problem of inaccurate physical effects in AI-generated videos for applications like simulation and content creation, representing an incremental improvement by integrating existing models with physical constraints.

The paper tackles the challenge of generating physically correct videos by proposing DiffPhy, a framework that fine-tunes a pre-trained video diffusion model using large language models to infer and enforce physical rules from text prompts, achieving state-of-the-art results across diverse physics-related scenarios.

Recent video diffusion models have demonstrated their great capability in generating visually-pleasing results, while synthesizing the correct physical effects in generated videos remains challenging. The complexity of real-world motions, interactions, and dynamics introduce great difficulties when learning physics from data. In this work, we propose DiffPhy, a generic framework that enables physically-correct and photo-realistic video generation by fine-tuning a pre-trained video diffusion model. Our method leverages large language models (LLMs) to infer rich physical context from the text prompt. To incorporate this context into the video diffusion model, we use a multimodal large language model (MLLM) to verify intermediate latent variables against the inferred physical rules, guiding the gradient updates of model accordingly. Textual output of LLM is transformed into continuous signals. We then formulate a set of training objectives that jointly ensure physical accuracy and semantic alignment with the input text. Additionally, failure facts of physical phenomena are corrected via attention injection. We also establish a high-quality physical video dataset containing diverse phyiscal actions and events to facilitate effective finetuning. Extensive experiments on public benchmarks demonstrate that DiffPhy is able to produce state-of-the-art results across diverse physics-related scenarios. Our project page is available at https://bwgzk-keke.github.io/DiffPhy/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes