CVAIGRLGMay 23

Φ-Noise: Training-Free Temporal Video Conditioning via Phase-Based Noise Manipulation

arXiv:2605.2450922.4
Predicted impact top 31% in CV · last 90 daysOriginality Synthesis-oriented
AI Analysis

It provides a simple, training-free approach for temporal video conditioning, reducing computational overhead for practitioners.

The paper introduces a training-free method for motion-conditioned video generation by injecting low-frequency phase information from a reference video into diffusion noise latents, achieving competitive or superior results without model modifications.

Latent video diffusion models generate videos by progressively transforming Gaussian noise into realistic samples conditioned on text or visual inputs. However, existing conditioning methods often require additional training and computational overhead. Motivated by recent findings on the importance of frequency components in generative models, we propose a simple, training-free approach for motion-conditioned video generation by injecting low-frequency phase information from a reference video directly into the diffusion noise latents. Our method transfers motion cues without modifying the model architecture or inference pipeline. Using several applications, we demonstrate effective control over both appearance and dynamics in generated videos, while achieving competitive or superior results compared to more complex conditioning approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes