CVAIDec 17, 2024

Guided and Variance-Corrected Fusion with One-shot Style Alignment for Large-Content Image Generation

arXiv:2412.12771v22 citationsh-index: 15Has CodeAAAI
Originality Incremental advance
AI Analysis

This work provides a plug-and-play solution for enhancing large-content image generation, which is incremental but addresses specific bottlenecks in existing fusion-based methods.

The paper tackled the problem of generating large images with small diffusion models by addressing artifacts like seams and style inconsistencies in patch fusion methods, achieving significant quality improvements through guided and variance-corrected fusion with one-shot style alignment.

Producing large images using small diffusion models is gaining increasing popularity, as the cost of training large models could be prohibitive. A common approach involves jointly generating a series of overlapped image patches and obtaining large images by merging adjacent patches. However, results from existing methods often exhibit noticeable artifacts, e.g., seams and inconsistent objects and styles. To address the issues, we proposed Guided Fusion (GF), which mitigates the negative impact from distant image regions by applying a weighted average to the overlapping regions. Moreover, we proposed Variance-Corrected Fusion (VCF), which corrects data variance at post-averaging, generating more accurate fusion for the Denoising Diffusion Probabilistic Model. Furthermore, we proposed a one-shot Style Alignment (SA), which generates a coherent style for large images by adjusting the initial input noise without adding extra computational burden. Extensive experiments demonstrated that the proposed fusion methods improved the quality of the generated image significantly. The proposed method can be widely applied as a plug-and-play module to enhance other fusion-based methods for large image generation. Code: https://github.com/TitorX/GVCFDiffusion

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes