CVAIGRLGApr 27, 2023

Putting People in Their Place: Affordance-Aware Human Insertion into Scenes

BerkeleyStanford
arXiv:2304.14406v164 citationsh-index: 111
Originality Incremental advance
AI Analysis

This addresses the problem of generating realistic human-scene compositions for applications like image editing and synthesis, though it is incremental as it builds on existing diffusion models and self-supervised learning.

The paper tackles the problem of realistically inserting people into scenes by inferring scene affordances, resulting in a method that synthesizes more realistic human appearance and more natural human-scene interactions than prior work, as shown in quantitative evaluation.

We study the problem of inferring scene affordances by presenting a method for realistically inserting people into scenes. Given a scene image with a marked region and an image of a person, we insert the person into the scene while respecting the scene affordances. Our model can infer the set of realistic poses given the scene context, re-pose the reference person, and harmonize the composition. We set up the task in a self-supervised fashion by learning to re-pose humans in video clips. We train a large-scale diffusion model on a dataset of 2.4M video clips that produces diverse plausible poses while respecting the scene context. Given the learned human-scene composition, our model can also hallucinate realistic people and scenes when prompted without conditioning and also enables interactive editing. A quantitative evaluation shows that our method synthesizes more realistic human appearance and more natural human-scene interactions than prior work.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes