CVNov 25, 2022

Towards Hard-pose Virtual Try-on via 3D-aware Global Correspondence Learning

arXiv:2211.14052v117 citationsh-index: 27Has Code
Originality Incremental advance
AI Analysis

This work addresses a challenging problem in virtual try-on for e-commerce and fashion applications, offering an incremental improvement by incorporating 3D priors to handle pose and viewpoint variations more effectively.

The paper tackles image-based virtual try-on with diverse poses and large viewpoint variations by introducing 3D-aware global correspondences that encode geometric priors, addressing limitations of existing 2D methods. It demonstrates superiority over state-of-the-art approaches on public benchmarks and a new HardPose test set.

In this paper, we target image-based person-to-person virtual try-on in the presence of diverse poses and large viewpoint variations. Existing methods are restricted in this setting as they estimate garment warping flows mainly based on 2D poses and appearance, which omits the geometric prior of the 3D human body shape. Moreover, current garment warping methods are confined to localized regions, which makes them ineffective in capturing long-range dependencies and results in inferior flows with artifacts. To tackle these issues, we present 3D-aware global correspondences, which are reliable flows that jointly encode global semantic correlations, local deformations, and geometric priors of 3D human bodies. Particularly, given an image pair depicting the source and target person, (a) we first obtain their pose-aware and high-level representations via two encoders, and introduce a coarse-to-fine decoder with multiple refinement modules to predict the pixel-wise global correspondence. (b) 3D parametric human models inferred from images are incorporated as priors to regularize the correspondence refinement process so that our flows can be 3D-aware and better handle variations of pose and viewpoint. (c) Finally, an adversarial generator takes the garment warped by the 3D-aware flow, and the image of the target person as inputs, to synthesize the photo-realistic try-on result. Extensive experiments on public benchmarks and our HardPose test set demonstrate the superiority of our method against the SOTA try-on approaches.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes