CV AI GR LGApr 27, 2023

Putting People in Their Place: Affordance-Aware Human Insertion into Scenes

Sumith Kulal, Tim Brooks, Alex Aiken, Jiajun Wu, Jimei Yang, Jingwan Lu, Alexei A. Efros, Krishna Kumar Singh

BerkeleyStanford

arXiv:2304.14406v123.864 citationsh-index: 111Has Code

Originality Incremental advance

AI Analysis

This addresses the problem of generating realistic human-scene compositions for applications like image editing and synthesis, though it is incremental as it builds on existing diffusion models and self-supervised learning.

The paper tackles the problem of realistically inserting people into scenes by inferring scene affordances, resulting in a method that synthesizes more realistic human appearance and more natural human-scene interactions than prior work, as shown in quantitative evaluation.

We study the problem of inferring scene affordances by presenting a method for realistically inserting people into scenes. Given a scene image with a marked region and an image of a person, we insert the person into the scene while respecting the scene affordances. Our model can infer the set of realistic poses given the scene context, re-pose the reference person, and harmonize the composition. We set up the task in a self-supervised fashion by learning to re-pose humans in video clips. We train a large-scale diffusion model on a dataset of 2.4M video clips that produces diverse plausible poses while respecting the scene context. Given the learned human-scene composition, our model can also hallucinate realistic people and scenes when prompted without conditioning and also enables interactive editing. A quantitative evaluation shows that our method synthesizes more realistic human appearance and more natural human-scene interactions than prior work.

View on arXiv PDF Code

Similar