CVNov 29, 2023

SmoothVideo: Smooth Video Synthesis with Noise Constraints on Diffusion Models for One-shot Video Tuning

Liang Peng, Haoran Cheng, Zheng Yang, Ruisi Zhao, Linxuan Xia, Chaotian Song, Qinglin Lu, Boxi Wu, Wei Liu

arXiv:2311.17536v22.82 citationsh-index: 17Has Code

Originality Incremental advance

AI Analysis

This work addresses smoothness issues in video synthesis for users of one-shot video tuning, but it is incremental as it builds on existing methods with a new constraint.

The paper tackled the problem of incoherence and inconsistency in videos generated by one-shot video tuning methods by introducing a noise constraint across frames to regulate noise predictions, resulting in significantly improved consistency and smoothness in generated videos.

Recent one-shot video tuning methods, which fine-tune the network on a specific video based on pre-trained text-to-image models (e.g., Stable Diffusion), are popular in the community because of the flexibility. However, these methods often produce videos marred by incoherence and inconsistency. To address these limitations, this paper introduces a simple yet effective noise constraint across video frames. This constraint aims to regulate noise predictions across their temporal neighbors, resulting in smooth latents. It can be simply included as a loss term during the training phase. By applying the loss to existing one-shot video tuning methods, we significantly improve the overall consistency and smoothness of the generated videos. Furthermore, we argue that current video evaluation metrics inadequately capture smoothness. To address this, we introduce a novel metric that considers detailed features and their temporal dynamics. Experimental results validate the effectiveness of our approach in producing smoother videos on various one-shot video tuning baselines. The source codes and video demos are available at \href{https://github.com/SPengLiang/SmoothVideo}{https://github.com/SPengLiang/SmoothVideo}.

View on arXiv PDF Code

Similar