CVApr 12, 2023

DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion

Johanna Karras, Aleksander Holynski, Ting-Chun Wang, Ira Kemelmacher-Shlizerman

arXiv:2304.06025v435.2225 citationsh-index: 34

Originality Incremental advance

AI Analysis

This addresses the need for automated fashion video creation, which is incremental as it builds upon existing diffusion models.

The authors tackled the problem of generating animated fashion videos from still images and sequences of human body poses, achieving state-of-the-art results in fashion video animation.

We present DreamPose, a diffusion-based method for generating animated fashion videos from still images. Given an image and a sequence of human body poses, our method synthesizes a video containing both human and fabric motion. To achieve this, we transform a pretrained text-to-image model (Stable Diffusion) into a pose-and-image guided video synthesis model, using a novel fine-tuning strategy, a set of architectural changes to support the added conditioning signals, and techniques to encourage temporal consistency. We fine-tune on a collection of fashion videos from the UBC Fashion dataset. We evaluate our method on a variety of clothing styles and poses, and demonstrate that our method produces state-of-the-art results on fashion video animation.Video results are available on our project page.

View on arXiv PDF

Similar