CVJan 4, 2024

Unified Diffusion-Based Rigid and Non-Rigid Editing with Text and Image Guidance

arXiv:2401.02126v18 citationsh-index: 3ICME
Originality Incremental advance
AI Analysis

This addresses a specific challenge in image editing for AI applications, offering an incremental improvement over existing methods.

The paper tackles the problem of combining rigid and non-rigid editing in text-to-image generation, which often leads to misaligned outputs, and presents a framework that achieves precise and versatile editing with competitive or superior results in text-based and appearance transfer tasks.

Existing text-to-image editing methods tend to excel either in rigid or non-rigid editing but encounter challenges when combining both, resulting in misaligned outputs with the provided text prompts. In addition, integrating reference images for control remains challenging. To address these issues, we present a versatile image editing framework capable of executing both rigid and non-rigid edits, guided by either textual prompts or reference images. We leverage a dual-path injection scheme to handle diverse editing scenarios and introduce an integrated self-attention mechanism for fusion of appearance and structural information. To mitigate potential visual artifacts, we further employ latent fusion techniques to adjust intermediate latents. Compared to previous work, our approach represents a significant advance in achieving precise and versatile image editing. Comprehensive experiments validate the efficacy of our method, showcasing competitive or superior results in text-based editing and appearance transfer tasks, encompassing both rigid and non-rigid settings.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes