CVApr 12, 2021

Multi-View Image-to-Image Translation Supervised by 3D Pose

arXiv:2104.05779v1Has Code
Originality Incremental advance
AI Analysis

This addresses the problem of generating realistic multi-view person images for applications like virtual reality or animation, but it is incremental as it builds on existing image-to-image translation methods.

The paper tackles multi-view image-to-image translation for person image generation by proposing an end-to-end framework that uses joint learning with 3D pose constraints to synthesize photo-realistic images with consistent poses across views, showing improved consistency compared to a baseline on the CMU-Panoptic dataset.

We address the task of multi-view image-to-image translation for person image generation. The goal is to synthesize photo-realistic multi-view images with pose-consistency across all views. Our proposed end-to-end framework is based on a joint learning of multiple unpaired image-to-image translation models, one per camera viewpoint. The joint learning is imposed by constraints on the shared 3D human pose in order to encourage the 2D pose projections in all views to be consistent. Experimental results on the CMU-Panoptic dataset demonstrate the effectiveness of the suggested framework in generating photo-realistic images of persons with new poses that are more consistent across all views in comparison to a standard Image-to-Image baseline. The code is available at: https://github.com/sony-si/MultiView-Img2Img

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes