CV LGOct 21, 2019

DwNet: Dense warp-based network for pose-guided human video generation

Polina Zablotskaia, Aliaksandr Siarohin, Bo Zhao, Leonid Sigal

arXiv:1910.09139v123.7169 citations

Originality Incremental advance

AI Analysis

This addresses the problem of realistic human video generation for applications like animation and virtual try-on, but it is incremental as it builds on existing GAN-based methods with refinements.

The paper tackles human motion transfer by generating high-resolution videos of a subject from a single image using motion from a driving video, achieving state-of-the-art performance on TaiChi and Fashion Modeling datasets.

Generation of realistic high-resolution videos of human subjects is a challenging and important task in computer vision. In this paper, we focus on human motion transfer - generation of a video depicting a particular subject, observed in a single image, performing a series of motions exemplified by an auxiliary (driving) video. Our GAN-based architecture, DwNet, leverages dense intermediate pose-guided representation and refinement process to warp the required subject appearance, in the form of the texture, from a source image into a desired pose. Temporal consistency is maintained by further conditioning the decoding process within a GAN on the previously generated frame. In this way a video is generated in an iterative and recurrent fashion. We illustrate the efficacy of our approach by showing state-of-the-art quantitative and qualitative performance on two benchmark datasets: TaiChi and Fashion Modeling. The latter is collected by us and will be made publicly available to the community.

View on arXiv PDF

Similar