CV LGDec 11, 2023

The Journey, Not the Destination: How Data Guides Diffusion Models

Kristian Georgiev, Joshua Vendrow, Hadi Salman, Sung Min Park, Aleksander Madry

MIT

arXiv:2312.06205v119.347 citationsh-index: 11Has Code

Originality Incremental advance

AI Analysis

This addresses a challenge in understanding and validating the influence of training data on diffusion model outputs, which is incremental as it builds on existing attribution methods.

The paper tackles the problem of attributing generated images back to specific training examples in diffusion models, proposing a framework for formal data attribution and counterfactual validation, and applies it to models trained on CIFAR-10 and MS COCO with provided code.

Diffusion models trained on large datasets can synthesize photo-realistic images of remarkable quality and diversity. However, attributing these images back to the training data-that is, identifying specific training examples which caused an image to be generated-remains a challenge. In this paper, we propose a framework that: (i) provides a formal notion of data attribution in the context of diffusion models, and (ii) allows us to counterfactually validate such attributions. Then, we provide a method for computing these attributions efficiently. Finally, we apply our method to find (and evaluate) such attributions for denoising diffusion probabilistic models trained on CIFAR-10 and latent diffusion models trained on MS COCO. We provide code at https://github.com/MadryLab/journey-TRAK .

View on arXiv PDF Code

Similar