LG MLJul 26, 2024

Heavy-Tailed Diffusion with Denoising Lévy Probabilistic Models

Dario Shariatian, Umut Simsekli, Alain Durmus

arXiv:2407.18609v410.411 citationsh-index: 30Has Code

Originality Incremental advance

AI Analysis

This work addresses mode collapse and class imbalance in diffusion models for machine learning applications, offering a more flexible and efficient alternative to existing heavy-tailed methods.

The paper tackles the challenge of using heavy-tailed noise distributions in diffusion models by proposing Denoising Lévy Probabilistic Models (DLPM), which replaces Gaussian noise with α-stable noise in DDPM, resulting in improved coverage of data distribution tails, better robustness to unbalanced datasets, and faster computation with fewer backward steps.

Exploring noise distributions beyond Gaussian in diffusion models remains an open challenge. While Gaussian-based models succeed within a unified SDE framework, recent studies suggest that heavy-tailed noise distributions, like $α$-stable distributions, may better handle mode collapse and effectively manage datasets exhibiting class imbalance, heavy tails, or prominent outliers. Recently, Yoon et al.\ (NeurIPS 2023), presented the Lévy-Itô model (LIM), directly extending the SDE-based framework to a class of heavy-tailed SDEs, where the injected noise followed an $α$-stable distribution, a rich class of heavy-tailed distributions. However, the LIM framework relies on highly involved mathematical techniques with limited flexibility, potentially hindering broader adoption and further development. In this study, instead of starting from the SDE formulation, we extend the denoising diffusion probabilistic model (DDPM) by replacing the Gaussian noise with $α$-stable noise. By using only elementary proof techniques, the proposed approach, Denoising Lévy Probabilistic Models (DLPM), boils down to vanilla DDPM with minor modifications. As opposed to the Gaussian case, DLPM and LIM yield different training algorithms and different backward processes, leading to distinct sampling algorithms. These fundamental differences translate favorably for DLPM as compared to LIM: our experiments show improvements in coverage of data distribution tails, better robustness to unbalanced datasets, and improved computation times requiring smaller number of backward steps.

View on arXiv PDF Code

Similar