CVAIJul 4, 2023

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

arXiv:2307.01952v14817 citationsh-index: 20Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of generating high-quality images from text for applications in creative and visual domains, representing an incremental advancement in open-source image synthesis models.

The paper tackles improving high-resolution image synthesis by introducing SDXL, a latent diffusion model with a larger UNet backbone and novel conditioning schemes, achieving drastically improved performance over previous Stable Diffusion versions and competitive results with state-of-the-art image generators.

We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder. We design multiple novel conditioning schemes and train SDXL on multiple aspect ratios. We also introduce a refinement model which is used to improve the visual fidelity of samples generated by SDXL using a post-hoc image-to-image technique. We demonstrate that SDXL shows drastically improved performance compared the previous versions of Stable Diffusion and achieves results competitive with those of black-box state-of-the-art image generators. In the spirit of promoting open research and fostering transparency in large model training and evaluation, we provide access to code and model weights at https://github.com/Stability-AI/generative-models

Code Implementations9 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes