CVLGDec 11, 2023

The Journey, Not the Destination: How Data Guides Diffusion Models

MIT
arXiv:2312.06205v142 citationsh-index: 11Has Code
Originality Incremental advance
AI Analysis

This addresses a challenge in understanding and validating the influence of training data on diffusion model outputs, which is incremental as it builds on existing attribution methods.

The paper tackles the problem of attributing generated images back to specific training examples in diffusion models, proposing a framework for formal data attribution and counterfactual validation, and applies it to models trained on CIFAR-10 and MS COCO with provided code.

Diffusion models trained on large datasets can synthesize photo-realistic images of remarkable quality and diversity. However, attributing these images back to the training data-that is, identifying specific training examples which caused an image to be generated-remains a challenge. In this paper, we propose a framework that: (i) provides a formal notion of data attribution in the context of diffusion models, and (ii) allows us to counterfactually validate such attributions. Then, we provide a method for computing these attributions efficiently. Finally, we apply our method to find (and evaluate) such attributions for denoising diffusion probabilistic models trained on CIFAR-10 and latent diffusion models trained on MS COCO. We provide code at https://github.com/MadryLab/journey-TRAK .

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes