CV AISep 25, 2025

CusEnhancer: A Zero-Shot Scene and Controllability Enhancement Method for Photo Customization via ResInversion

Maoye Ren, Praneetha Vaddamanu, Jianjin Xu, Fernando De la Torre Frade

arXiv:2509.20775v13.6h-index: 2

Originality Incremental advance

AI Analysis

This addresses the problem of limited scene quality and control in personalized photo generation for users of diffusion models, representing a strong incremental improvement over existing customization methods.

The paper tackles degraded scenes, insufficient control, and suboptimal identity fidelity in text-to-image diffusion models for photo customization by introducing CustomEnhancer, a zero-shot enhancement framework that achieves state-of-the-art results in scene diversity and identity fidelity while enabling training-free control and reducing inversion time by 129 times with ResInversion.

Recently remarkable progress has been made in synthesizing realistic human photos using text-to-image diffusion models. However, current approaches face degraded scenes, insufficient control, and suboptimal perceptual identity. We introduce CustomEnhancer, a novel framework to augment existing identity customization models. CustomEnhancer is a zero-shot enhancement pipeline that leverages face swapping techniques, pretrained diffusion model, to obtain additional representations in a zeroshot manner for encoding into personalized models. Through our proposed triple-flow fused PerGeneration approach, which identifies and combines two compatible counter-directional latent spaces to manipulate a pivotal space of personalized model, we unify the generation and reconstruction processes, realizing generation from three flows. Our pipeline also enables comprehensive training-free control over the generation process of personalized models, offering precise controlled personalization for them and eliminating the need for controller retraining for per-model. Besides, to address the high time complexity of null-text inversion (NTI), we introduce ResInversion, a novel inversion method that performs noise rectification via a pre-diffusion mechanism, reducing the inversion time by 129 times. Experiments demonstrate that CustomEnhancer reach SOTA results at scene diversity, identity fidelity, training-free controls, while also showing the efficiency of our ResInversion over NTI. The code will be made publicly available upon paper acceptance.

View on arXiv PDF

Similar