Latent Geometry and Memorization in Generative Models
This addresses the challenge of evaluating generative models for researchers and practitioners, though it is incremental as it builds on existing density-based analysis.
The paper tackles the problem of distinguishing between novel generation and memorization in trained generative models by proposing to study the output density directly, and demonstrates that memorization corresponds to delta functions concentrated on memorized examples.
It can be difficult to tell whether a trained generative model has learned to generate novel examples or has simply memorized a specific set of outputs. In published work, it is common to attempt to address this visually, for example by displaying a generated example and its nearest neighbor(s) in the training set (in, for example, the L2 metric). As any generative model induces a probability density on its output domain, we propose studying this density directly. We first study the geometry of the latent representation and generator, relate this to the output density, and then develop techniques to compute and inspect the output density. As an application, we demonstrate that "memorization" tends to a density made of delta functions concentrated on the memorized examples. We note that without first understanding the geometry, the measurement would be essentially impossible to make.