CVMMJun 6, 2022

Scene Aware Person Image Generation through Global Contextual Conditioning

arXiv:2206.02717v26 citationsh-index: 51
Originality Incremental advance
AI Analysis

This addresses a challenging task in computer vision for applications like image editing, but it appears incremental as it builds on existing generative methods.

The paper tackles the problem of generating and inserting contextually relevant person images into existing scenes by predicting location, pose, and scale to blend with the scene, achieving high-resolution photo-realistic results.

Person image generation is an intriguing yet challenging problem. However, this task becomes even more difficult under constrained situations. In this work, we propose a novel pipeline to generate and insert contextually relevant person images into an existing scene while preserving the global semantics. More specifically, we aim to insert a person such that the location, pose, and scale of the person being inserted blends in with the existing persons in the scene. Our method uses three individual networks in a sequential pipeline. At first, we predict the potential location and the skeletal structure of the new person by conditioning a Wasserstein Generative Adversarial Network (WGAN) on the existing human skeletons present in the scene. Next, the predicted skeleton is refined through a shallow linear network to achieve higher structural accuracy in the generated image. Finally, the target image is generated from the refined skeleton using another generative network conditioned on a given image of the target person. In our experiments, we achieve high-resolution photo-realistic generation results while preserving the general context of the scene. We conclude our paper with multiple qualitative and quantitative benchmarks on the results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes