CV AI LGJun 9, 2023

Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model

Yida Chen, Fernanda Viégas, Martin Wattenberg

Harvard

arXiv:2306.05720v218.135 citationsh-index: 68Has Code

Originality Incremental advance

AI Analysis

This addresses the interpretability problem for researchers and practitioners in AI, providing insights into how LDMs generate coherent 3D scenes without explicit training, though it is incremental in nature.

The study investigated whether latent diffusion models (LDMs) internally represent simple scene geometry, finding that internal activations encode linear representations of 3D depth and object-background distinctions, which play a causal role in image synthesis and enable high-level editing.

Latent diffusion models (LDMs) exhibit an impressive ability to produce realistic images, yet the inner workings of these models remain mysterious. Even when trained purely on images without explicit depth information, they typically output coherent pictures of 3D scenes. In this work, we investigate a basic interpretability question: does an LDM create and use an internal representation of simple scene geometry? Using linear probes, we find evidence that the internal activations of the LDM encode linear representations of both 3D depth data and a salient-object / background distinction. These representations appear surprisingly early in the denoising process$-$well before a human can easily make sense of the noisy images. Intervention experiments further indicate these representations play a causal role in image synthesis, and may be used for simple high-level editing of an LDM's output. Project page: https://yc015.github.io/scene-representation-diffusion-model/

View on arXiv PDF Code

Similar