Patronus: Bringing Transparency to Diffusion Models with Prototypes
This addresses the problem of lack of transparency in diffusion models for researchers and practitioners, offering a novel interpretability approach that is incremental in combining existing methods.
The paper tackles the opacity of diffusion models in image generation by introducing Patronus, an interpretable model that integrates a prototypical network into DDPMs to extract prototypes and condition generation on their activation vectors, enabling tasks like image manipulation and detection of shortcut learning without annotations.
Diffusion-based generative models, such as Denoising Diffusion Probabilistic Models (DDPMs), have achieved remarkable success in image generation, but their step-by-step denoising process remains opaque, leaving critical aspects of the generation mechanism unexplained. To address this, we introduce \emph{Patronus}, an interpretable diffusion model inspired by ProtoPNet. Patronus integrates a prototypical network into DDPMs, enabling the extraction of prototypes and conditioning of the generation process on their prototype activation vector. This design enhances interpretability by showing the learned prototypes and how they influence the generation process. Additionally, the model supports downstream tasks like image manipulation, enabling more transparent and controlled modifications. Moreover, Patronus could reveal shortcut learning in the generation process by detecting unwanted correlations between learned prototypes. Notably, Patronus operates entirely without any annotations or text prompts. This work opens new avenues for understanding and controlling diffusion models through prototype-based interpretability. Our code is available at \href{https://github.com/nina-weng/patronus}{https://github.com/nina-weng/patronus}.