CVJun 1, 2023

The Hidden Language of Diffusion Models

DeepMindMeta AI
arXiv:2306.00966v338 citationsh-index: 72
Originality Incremental advance
AI Analysis

This work addresses the interpretability of diffusion models for researchers and practitioners, though it is incremental as it builds on existing models without introducing a new paradigm.

The authors tackled the problem of interpreting the internal representations of concepts in text-to-image diffusion models, and developed Conceptor, a method that decomposes concepts into human-interpretable textual elements, revealing non-trivial structures like visual connections and biases in models such as Stable Diffusion.

Text-to-image diffusion models have demonstrated an unparalleled ability to generate high-quality, diverse images from a textual prompt. However, the internal representations learned by these models remain an enigma. In this work, we present Conceptor, a novel method to interpret the internal representation of a textual concept by a diffusion model. This interpretation is obtained by decomposing the concept into a small set of human-interpretable textual elements. Applied over the state-of-the-art Stable Diffusion model, Conceptor reveals non-trivial structures in the representations of concepts. For example, we find surprising visual connections between concepts, that transcend their textual semantics. We additionally discover concepts that rely on mixtures of exemplars, biases, renowned artistic styles, or a simultaneous fusion of multiple meanings of the concept. Through a large battery of experiments, we demonstrate Conceptor's ability to provide meaningful, robust, and faithful decompositions for a wide variety of abstract, concrete, and complex textual concepts, while allowing to naturally connect each decomposition element to its corresponding visual impact on the generated images. Our code will be available at: https://hila-chefer.github.io/Conceptor/

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes