CVAug 19, 2023

MeDM: Mediating Image Diffusion Models for Video-to-Video Translation with Temporal Correspondence Guidance

Ernie Chu, Tzuhsuan Huang, Shuo-Yen Lin, Jun-Cheng Chen

arXiv:2308.10079v313.125 citationsh-index: 5

Originality Incremental advance

AI Analysis

This addresses the problem of generating temporally consistent videos for applications like scene rendering and text-guided editing, representing an incremental improvement by leveraging existing models without fine-tuning.

The study tackled video-to-video translation by introducing MeDM, an efficient method that uses pre-trained image Diffusion Models with temporal correspondence guidance, achieving superior results in qualitative, quantitative, and subjective experiments on various benchmarks.

This study introduces an efficient and effective method, MeDM, that utilizes pre-trained image Diffusion Models for video-to-video translation with consistent temporal flow. The proposed framework can render videos from scene position information, such as a normal G-buffer, or perform text-guided editing on videos captured in real-world scenarios. We employ explicit optical flows to construct a practical coding that enforces physical constraints on generated frames and mediates independent frame-wise scores. By leveraging this coding, maintaining temporal consistency in the generated videos can be framed as an optimization problem with a closed-form solution. To ensure compatibility with Stable Diffusion, we also suggest a workaround for modifying observation-space scores in latent Diffusion Models. Notably, MeDM does not require fine-tuning or test-time optimization of the Diffusion Models. Through extensive qualitative, quantitative, and subjective experiments on various benchmarks, the study demonstrates the effectiveness and superiority of the proposed approach. Our project page can be found at https://medm2023.github.io

View on arXiv PDF

Similar