FlowLoss: Dynamic Flow-Conditioned Loss Strategy for Video Diffusion Models
This work addresses motion coherence issues in video generation for robotics applications, offering incremental improvements over existing flow supervision methods.
The paper tackled the problem of temporally incoherent motion in Video Diffusion Models by proposing FlowLoss, a dynamic flow-conditioned loss strategy that directly compares flow fields and uses a noise-aware weighting scheme, resulting in improved motion stability and faster convergence in early training stages.
Video Diffusion Models (VDMs) can generate high-quality videos, but often struggle with producing temporally coherent motion. Optical flow supervision is a promising approach to address this, with prior works commonly employing warping-based strategies that avoid explicit flow matching. In this work, we explore an alternative formulation, FlowLoss, which directly compares flow fields extracted from generated and ground-truth videos. To account for the unreliability of flow estimation under high-noise conditions in diffusion, we propose a noise-aware weighting scheme that modulates the flow loss across denoising steps. Experiments on robotic video datasets suggest that FlowLoss improves motion stability and accelerates convergence in early training stages. Our findings offer practical insights for incorporating motion-based supervision into noise-conditioned generative models.