LGAICVApr 18, 2024

EdgeFusion: On-Device Text-to-Image Generation

arXiv:2404.11925v115 citationsh-index: 8
Originality Incremental advance
AI Analysis

This work addresses the problem of deploying text-to-image models on resource-limited edge devices, representing an incremental improvement over existing methods.

The paper tackles the computational burden of Stable Diffusion for text-to-image generation on edge devices by developing strategies using a compact variant and advanced distillation, achieving photo-realistic image generation in two steps with under one-second latency.

The intensive computational burden of Stable Diffusion (SD) for text-to-image generation poses a significant hurdle for its practical application. To tackle this challenge, recent research focuses on methods to reduce sampling steps, such as Latent Consistency Model (LCM), and on employing architectural optimizations, including pruning and knowledge distillation. Diverging from existing approaches, we uniquely start with a compact SD variant, BK-SDM. We observe that directly applying LCM to BK-SDM with commonly used crawled datasets yields unsatisfactory results. It leads us to develop two strategies: (1) leveraging high-quality image-text pairs from leading generative models and (2) designing an advanced distillation process tailored for LCM. Through our thorough exploration of quantization, profiling, and on-device deployment, we achieve rapid generation of photo-realistic, text-aligned images in just two steps, with latency under one second on resource-limited edge devices.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes