CVFeb 22, 2021

Style and Pose Control for Image Synthesis of Humans from a Single Monocular View

arXiv:2102.11263v172 citations
Originality Incremental advance
AI Analysis

It addresses the need for high-quality human image re-rendering in applications like virtual try-on and motion transfer, though it is incremental as it builds on existing GAN-based approaches.

The paper tackles the problem of synthesizing photo-realistic human images from a single view with control over pose and appearance, achieving state-of-the-art fidelity and outperforming existing methods in perceptual metrics and user studies.

Photo-realistic re-rendering of a human from a single image with explicit control over body pose, shape and appearance enables a wide range of applications, such as human appearance transfer, virtual try-on, motion imitation, and novel view synthesis. While significant progress has been made in this direction using learning-based image generation tools, such as GANs, existing approaches yield noticeable artefacts such as blurring of fine details, unrealistic distortions of the body parts and garments as well as severe changes of the textures. We, therefore, propose a new method for synthesising photo-realistic human images with explicit control over pose and part-based appearance, i.e., StylePoseGAN, where we extend a non-controllable generator to accept conditioning of pose and appearance separately. Our network can be trained in a fully supervised way with human images to disentangle pose, appearance and body parts, and it significantly outperforms existing single image re-rendering methods. Our disentangled representation opens up further applications such as garment transfer, motion transfer, virtual try-on, head (identity) swap and appearance interpolation. StylePoseGAN achieves state-of-the-art image generation fidelity on common perceptual metrics compared to the current best-performing methods and convinces in a comprehensive user study.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes