Multi-scale Generative Modeling for Fast Sampling
This work addresses sampling efficiency for generative models, offering incremental improvements for applications requiring fast image generation.
The paper tackles the problem of slow sampling in diffusion-based generative models by proposing a multi-scale approach in the wavelet domain, which reduces sampling steps by 40% and trainable parameters by 30% while maintaining competitive image quality on benchmark datasets.
While working within the spatial domain can pose problems associated with ill-conditioned scores caused by power-law decay, recent advances in diffusion-based generative models have shown that transitioning to the wavelet domain offers a promising alternative. However, within the wavelet domain, we encounter unique challenges, especially the sparse representation of high-frequency coefficients, which deviates significantly from the Gaussian assumptions in the diffusion process. To this end, we propose a multi-scale generative modeling in the wavelet domain that employs distinct strategies for handling low and high-frequency bands. In the wavelet domain, we apply score-based generative modeling with well-conditioned scores for low-frequency bands, while utilizing a multi-scale generative adversarial learning for high-frequency bands. As supported by the theoretical analysis and experimental results, our model significantly improve performance and reduce the number of trainable parameters, sampling steps, and time.