CVLGIVApr 21, 2023

Speed Is All You Need: On-Device Acceleration of Large Diffusion Models via GPU-Aware Optimizations

arXiv:2304.11267v263 citationsh-index: 25
Originality Incremental advance
AI Analysis

This work enables on-device generative AI for mobile users, improving speed and privacy, but it is incremental as it focuses on optimization rather than new model architectures.

The paper tackled the challenge of deploying large diffusion models on mobile devices with limited resources by developing GPU-aware optimizations, achieving the fastest reported inference latency of under 12 seconds for Stable Diffusion 1.4 on a Samsung S23 Ultra.

The rapid development and application of foundation models have revolutionized the field of artificial intelligence. Large diffusion models have gained significant attention for their ability to generate photorealistic images and support various tasks. On-device deployment of these models provides benefits such as lower server costs, offline functionality, and improved user privacy. However, common large diffusion models have over 1 billion parameters and pose challenges due to restricted computational and memory resources on devices. We present a series of implementation optimizations for large diffusion models that achieve the fastest reported inference latency to-date (under 12 seconds for Stable Diffusion 1.4 without int8 quantization on Samsung S23 Ultra for a 512x512 image with 20 iterations) on GPU-equipped mobile devices. These enhancements broaden the applicability of generative AI and improve the overall user experience across a wide range of devices.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes