CVNov 14, 2023

UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs

arXiv:2311.09257v5197 citationsh-index: 14
Originality Incremental advance
AI Analysis

This addresses the problem of slow inference for users of text-to-image generation, offering a significant speed improvement, though it is incremental as it builds on existing diffusion and GAN methods.

The paper tackles the high computational cost of text-to-image diffusion models by introducing UFOGen, a hybrid model that integrates diffusion models with a GAN objective to enable ultra-fast, one-step synthesis, generating high-quality images from text in a single step.

Text-to-image diffusion models have demonstrated remarkable capabilities in transforming textual prompts into coherent images, yet the computational cost of their inference remains a persistent challenge. To address this issue, we present UFOGen, a novel generative model designed for ultra-fast, one-step text-to-image synthesis. In contrast to conventional approaches that focus on improving samplers or employing distillation techniques for diffusion models, UFOGen adopts a hybrid methodology, integrating diffusion models with a GAN objective. Leveraging a newly introduced diffusion-GAN objective and initialization with pre-trained diffusion models, UFOGen excels in efficiently generating high-quality images conditioned on textual descriptions in a single step. Beyond traditional text-to-image generation, UFOGen showcases versatility in applications. Notably, UFOGen stands among the pioneering models enabling one-step text-to-image generation and diverse downstream tasks, presenting a significant advancement in the landscape of efficient generative models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes