LG AI MLJun 10, 2021

Soft Truncation: A Universal Training Technique of Score-based Diffusion Model for High Precision Score Estimation

Dongjun Kim, Seungjae Shin, Kyungwoo Song, Wanmo Kang, Il-Chul Moon

arXiv:2106.05527v529.4115 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses a training bottleneck in diffusion models for high-fidelity image generation, offering an incremental improvement to enhance model performance across multiple benchmarks.

The paper tackles the inverse correlation between density estimation and sample generation in diffusion models by identifying that small diffusion times dominate density estimation while large times dominate generation, and introduces Soft Truncation, a training technique that replaces a fixed truncation hyperparameter with a random variable to balance loss scales, achieving state-of-the-art performance on datasets like CIFAR-10 and CelebA-HQ 256x256.

Recent advances in diffusion models bring state-of-the-art performance on image generation tasks. However, empirical results from previous research in diffusion models imply an inverse correlation between density estimation and sample generation performances. This paper investigates with sufficient empirical evidence that such inverse correlation happens because density estimation is significantly contributed by small diffusion time, whereas sample generation mainly depends on large diffusion time. However, training a score network well across the entire diffusion time is demanding because the loss scale is significantly imbalanced at each diffusion time. For successful training, therefore, we introduce Soft Truncation, a universally applicable training technique for diffusion models, that softens the fixed and static truncation hyperparameter into a random variable. In experiments, Soft Truncation achieves state-of-the-art performance on CIFAR-10, CelebA, CelebA-HQ 256x256, and STL-10 datasets.

View on arXiv PDF Code

Similar