CVIVJul 11, 2025

Upsample What Matters: Region-Adaptive Latent Sampling for Accelerated Diffusion Transformers

arXiv:2507.08422v29 citationsh-index: 5
Originality Incremental advance
AI Analysis

This addresses the computational bottleneck for deploying diffusion transformers in real-world applications, offering a complementary acceleration method to existing temporal techniques.

The paper tackles the heavy computation of diffusion transformers for image and video generation by proposing RALU, a training-free framework that accelerates inference along the spatial dimension, achieving up to 7.0x speed-up on FLUX and 3.0x on Stable Diffusion 3 with minimal quality degradation.

Diffusion transformers have emerged as an alternative to U-net-based diffusion models for high-fidelity image and video generation, offering superior scalability. However, their heavy computation remains a major obstacle to real-world deployment. Existing acceleration methods primarily exploit the temporal dimension such as reusing cached features across diffusion timesteps. Here, we propose Region-Adaptive Latent Upsampling (RALU), a training-free framework that accelerates inference along spatial dimension. RALU performs mixed-resolution sampling across three stages: 1) low-resolution denoising latent diffusion to efficiently capture global semantic structure, 2) region-adaptive upsampling on specific regions prone to artifacts at full-resolution, and 3) all latent upsampling at full-resolution for detail refinement. To stabilize generations across resolution transitions, we leverage noise-timestep rescheduling to adapt the noise level across varying resolutions. Our method significantly reduces computation while preserving image quality by achieving up to 7.0$\times$ speed-up on FLUX and 3.0$\times$ on Stable Diffusion 3 with minimal degradation. Furthermore, RALU is complementary to existing temporal accelerations such as caching methods, thus can be seamlessly integrated to further reduce inference latency without compromising generation quality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes