PersonaHOI: Effortlessly Improving Personalized Face with Human-Object Interaction Generation
This addresses the issue of overemphasized facial features at the expense of full-body coherence in personalized face generation for practical applications, though it is incremental as it builds on existing diffusion models.
The paper tackles the problem of generating identity-consistent human-object interaction images without training or tuning, by fusing a general StableDiffusion model with a personalized face diffusion model, resulting in superior realism and scalability as validated by a novel interaction alignment metric.
We introduce PersonaHOI, a training- and tuning-free framework that fuses a general StableDiffusion model with a personalized face diffusion (PFD) model to generate identity-consistent human-object interaction (HOI) images. While existing PFD models have advanced significantly, they often overemphasize facial features at the expense of full-body coherence, PersonaHOI introduces an additional StableDiffusion (SD) branch guided by HOI-oriented text inputs. By incorporating cross-attention constraints in the PFD branch and spatial merging at both latent and residual levels, PersonaHOI preserves personalized facial details while ensuring interactive non-facial regions. Experiments, validated by a novel interaction alignment metric, demonstrate the superior realism and scalability of PersonaHOI, establishing a new standard for practical personalized face with HOI generation. Our code will be available at https://github.com/JoyHuYY1412/PersonaHOI