IVCVLGMay 22, 2023

GSURE-Based Diffusion Model Training with Corrupted Data

arXiv:2305.13128v241 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of data collection costs in domains like medical imaging, though it is an incremental improvement on existing diffusion model training methods.

The paper tackles the problem of training diffusion models without clean data by proposing a GSURE-based loss function that uses only corrupted data, achieving generative performance comparable to fully supervised models on face images and MRI, with no clean signals required.

Diffusion models have demonstrated impressive results in both data generation and downstream tasks such as inverse problems, text-based editing, classification, and more. However, training such models usually requires large amounts of clean signals which are often difficult or impossible to obtain. In this work, we propose a novel training technique for generative diffusion models based only on corrupted data. We introduce a loss function based on the Generalized Stein's Unbiased Risk Estimator (GSURE), and prove that under some conditions, it is equivalent to the training objective used in fully supervised diffusion models. We demonstrate our technique on face images as well as Magnetic Resonance Imaging (MRI), where the use of undersampled data significantly alleviates data collection costs. Our approach achieves generative performance comparable to its fully supervised counterpart without training on any clean signals. In addition, we deploy the resulting diffusion model in various downstream tasks beyond the degradation present in the training set, showcasing promising results.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes