CVJan 14

PhyRPR: Training-Free Physics-Constrained Video Generation

arXiv:2601.09255v11 citationsh-index: 9
Originality Incremental advance
AI Analysis

This addresses the issue of physically implausible videos in AI-generated content, offering a method for better control in video generation, though it is incremental as it builds on existing diffusion models.

The paper tackles the problem of video generation models struggling to satisfy physical constraints by proposing PhyRPR, a training-free three-stage pipeline that decouples physical understanding from visual synthesis, resulting in improved physical plausibility and motion controllability in experiments.

Recent diffusion-based video generation models can synthesize visually plausible videos, yet they often struggle to satisfy physical constraints. A key reason is that most existing approaches remain single-stage: they entangle high-level physical understanding with low-level visual synthesis, making it hard to generate content that require explicit physical reasoning. To address this limitation, we propose a training-free three-stage pipeline,\textit{PhyRPR}:\textit{Phy\uline{R}eason}--\textit{Phy\uline{P}lan}--\textit{Phy\uline{R}efine}, which decouples physical understanding from visual synthesis. Specifically, \textit{PhyReason} uses a large multimodal model for physical state reasoning and an image generator for keyframe synthesis; \textit{PhyPlan} deterministically synthesizes a controllable coarse motion scaffold; and \textit{PhyRefine} injects this scaffold into diffusion sampling via a latent fusion strategy to refine appearance while preserving the planned dynamics. This staged design enables explicit physical control during generation. Extensive experiments under physics constraints show that our method consistently improves physical plausibility and motion controllability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes