CLCVLGFeb 16, 2023

Exploring the Representation Manifolds of Stable Diffusion Through the Lens of Intrinsic Dimension

arXiv:2302.09301v17 citationsh-index: 8
Originality Synthesis-oriented
AI Analysis

This work provides incremental insights into the mathematical understanding of prompting in text-to-image models, potentially aiding future studies on prompt impact.

The authors investigated how prompts affect the geometric properties of Stable Diffusion's internal representations by measuring intrinsic dimension, finding that prompt choice significantly influences intrinsic dimension in certain layers, with correlations to prompt perplexity in bottleneck layers but not in latent layers.

Prompting has become an important mechanism by which users can more effectively interact with many flavors of foundation model. Indeed, the last several years have shown that well-honed prompts can sometimes unlock emergent capabilities within such models. While there has been a substantial amount of empirical exploration of prompting within the community, relatively few works have studied prompting at a mathematical level. In this work we aim to take a first step towards understanding basic geometric properties induced by prompts in Stable Diffusion, focusing on the intrinsic dimension of internal representations within the model. We find that choice of prompt has a substantial impact on the intrinsic dimension of representations at both layers of the model which we explored, but that the nature of this impact depends on the layer being considered. For example, in certain bottleneck layers of the model, intrinsic dimension of representations is correlated with prompt perplexity (measured using a surrogate model), while this correlation is not apparent in the latent layers. Our evidence suggests that intrinsic dimension could be a useful tool for future studies of the impact of different prompts on text-to-image models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes