LG AIFeb 2

Sparsely Supervised Diffusion

Wenshuai Zhao, Zhiyuan Li, Yi Zhao, Mohammad Hassan Vali, Martin Trapp, Joni Pajarinen, Juho Kannala, Arno Solin

arXiv:2602.02699v11.4

Originality Incremental advance

AI Analysis

This addresses a specific issue in generative AI for researchers and practitioners, but it is incremental as it builds on existing diffusion model frameworks.

The paper tackles the problem of spatially inconsistent generation in diffusion models by proposing a sparsely supervised learning method with a masking strategy, which achieves competitive FID scores and avoids training instability on small datasets.

Diffusion models have shown remarkable success across a wide range of generative tasks. However, they often suffer from spatially inconsistent generation, arguably due to the inherent locality of their denoising mechanisms. This can yield samples that are locally plausible but globally inconsistent. To mitigate this issue, we propose sparsely supervised learning for diffusion models, a simple yet effective masking strategy that can be implemented with only a few lines of code. Interestingly, the experiments show that it is safe to mask up to 98\% of pixels during diffusion model training. Our method delivers competitive FID scores across experiments and, most importantly, avoids training instability on small datasets. Moreover, the masking strategy reduces memorization and promotes the use of essential contextual information during generation.

View on arXiv PDF

Similar