CVLGOct 21, 2019

DwNet: Dense warp-based network for pose-guided human video generation

arXiv:1910.09139v1169 citations
Originality Incremental advance
AI Analysis

This addresses the problem of realistic human video generation for applications like animation and virtual try-on, but it is incremental as it builds on existing GAN-based methods with refinements.

The paper tackles human motion transfer by generating high-resolution videos of a subject from a single image using motion from a driving video, achieving state-of-the-art performance on TaiChi and Fashion Modeling datasets.

Generation of realistic high-resolution videos of human subjects is a challenging and important task in computer vision. In this paper, we focus on human motion transfer - generation of a video depicting a particular subject, observed in a single image, performing a series of motions exemplified by an auxiliary (driving) video. Our GAN-based architecture, DwNet, leverages dense intermediate pose-guided representation and refinement process to warp the required subject appearance, in the form of the texture, from a source image into a desired pose. Temporal consistency is maintained by further conditioning the decoding process within a GAN on the previously generated frame. In this way a video is generated in an iterative and recurrent fashion. We illustrate the efficacy of our approach by showing state-of-the-art quantitative and qualitative performance on two benchmark datasets: TaiChi and Fashion Modeling. The latter is collected by us and will be made publicly available to the community.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes