CVAIMar 26, 2024

AID: Attention Interpolation of Text-to-Image Diffusion

arXiv:2403.17924v327 citationsh-index: 5Has CodeNIPS
Originality Incremental advance
AI Analysis

This addresses a specific bottleneck in text-to-image diffusion models for applications requiring controlled image generation, representing an incremental improvement over existing methods.

The paper tackles the problem of generating smooth and consistent images when interpolating between text or pose conditions in diffusion models, introducing a training-free technique called AID that improves fidelity and smoothness with control over the interpolation path.

Conditional diffusion models can create unseen images in various settings, aiding image interpolation. Interpolation in latent spaces is well-studied, but interpolation with specific conditions like text or poses is less understood. Simple approaches, such as linear interpolation in the space of conditions, often result in images that lack consistency, smoothness, and fidelity. To that end, we introduce a novel training-free technique named Attention Interpolation via Diffusion (AID). Our key contributions include 1) proposing an inner/outer interpolated attention layer; 2) fusing the interpolated attention with self-attention to boost fidelity; and 3) applying beta distribution to selection to increase smoothness. We also present a variant, Prompt-guided Attention Interpolation via Diffusion (PAID), that considers interpolation as a condition-dependent generative process. This method enables the creation of new images with greater consistency, smoothness, and efficiency, and offers control over the exact path of interpolation. Our approach demonstrates effectiveness for conceptual and spatial interpolation. Code and demo are available at https://github.com/QY-H00/attention-interpolation-diffusion.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes