CVApr 22, 2025

Structure-Preserving Zero-Shot Image Editing via Stage-Wise Latent Injection in Diffusion Models

Dasol Jeong, Donggoo Kang, Jiwon Park, Hyebean Lee, Joonki Paik

arXiv:2504.15723v26.21 citationsh-index: 6

Originality Incremental advance

AI Analysis

This addresses the challenge of preserving structural integrity while enabling precise modifications in image editing for users needing scalable and adaptable tools, though it appears incremental as it builds on existing diffusion methods.

The paper tackles the problem of zero-shot image editing by proposing a diffusion-based framework that unifies text-guided and reference-guided approaches without fine-tuning, achieving state-of-the-art performance in tasks like expression transfer, texture transformation, and style infusion.

We propose a diffusion-based framework for zero-shot image editing that unifies text-guided and reference-guided approaches without requiring fine-tuning. Our method leverages diffusion inversion and timestep-specific null-text embeddings to preserve the structural integrity of the source image. By introducing a stage-wise latent injection strategy-shape injection in early steps and attribute injection in later steps-we enable precise, fine-grained modifications while maintaining global consistency. Cross-attention with reference latents facilitates semantic alignment between the source and reference. Extensive experiments across expression transfer, texture transformation, and style infusion demonstrate state-of-the-art performance, confirming the method's scalability and adaptability to diverse image editing scenarios.

View on arXiv PDF

Similar