CVDec 12, 2023

Boosting Latent Diffusion with Flow Matching

arXiv:2312.07360v346 citationsh-index: 13ECCV
Originality Incremental advance
AI Analysis

This addresses the computational inefficiency in visual synthesis for applications requiring high-resolution images, though it is incremental by combining existing methods.

The paper tackles the slow training and synthesis of diffusion models by integrating flow matching with a frozen diffusion model and convolutional decoder, achieving state-of-the-art high-resolution image synthesis at 1024^2 pixels with minimal computational cost and scaling up to 2048^2 pixels.

Visual synthesis has recently seen significant leaps in performance, largely due to breakthroughs in generative models. Diffusion models have been a key enabler, as they excel in image diversity. However, this comes at the cost of slow training and synthesis, which is only partially alleviated by latent diffusion. To this end, flow matching is an appealing approach due to its complementary characteristics of faster training and inference but less diverse synthesis. We demonstrate that introducing flow matching between a frozen diffusion model and a convolutional decoder enables high-resolution image synthesis at reduced computational cost and model size. A small diffusion model can then effectively provide the necessary visual diversity, while flow matching efficiently enhances resolution and detail by mapping the small to a high-dimensional latent space. These latents are then projected to high-resolution images by the subsequent convolutional decoder of the latent diffusion approach. Combining the diversity of diffusion models, the efficiency of flow matching, and the effectiveness of convolutional decoders, state-of-the-art high-resolution image synthesis is achieved at $1024^2$ pixels with minimal computational cost. Further scaling up our method we can reach resolutions up to $2048^2$ pixels. Importantly, our approach is orthogonal to recent approximation and speed-up strategies for the underlying model, making it easily integrable into the various diffusion model frameworks.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes