DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion
This addresses the need for automated fashion video creation, which is incremental as it builds upon existing diffusion models.
The authors tackled the problem of generating animated fashion videos from still images and sequences of human body poses, achieving state-of-the-art results in fashion video animation.
We present DreamPose, a diffusion-based method for generating animated fashion videos from still images. Given an image and a sequence of human body poses, our method synthesizes a video containing both human and fabric motion. To achieve this, we transform a pretrained text-to-image model (Stable Diffusion) into a pose-and-image guided video synthesis model, using a novel fine-tuning strategy, a set of architectural changes to support the added conditioning signals, and techniques to encourage temporal consistency. We fine-tune on a collection of fashion videos from the UBC Fashion dataset. We evaluate our method on a variety of clothing styles and poses, and demonstrate that our method produces state-of-the-art results on fashion video animation.Video results are available on our project page.