Addressing degeneracies in latent interpolation for diffusion models
This addresses a practical issue for users of diffusion models in applications like data augmentation and image morphing, but it is incremental as it builds on existing interpolation methods.
The paper tackles the problem of degeneracies in latent interpolation for diffusion models when interpolating between many input images, and proposes a normalization scheme that reduces degeneration and improves image quality metrics like FID and CLIP distance.
There is an increasing interest in using image-generating diffusion models for deep data augmentation and image morphing. In this context, it is useful to interpolate between latents produced by inverting a set of input images, in order to generate new images representing some mixture of the inputs. We observe that such interpolation can easily lead to degenerate results when the number of inputs is large. We analyze the cause of this effect theoretically and experimentally, and suggest a suitable remedy. The suggested approach is a relatively simple normalization scheme that is easy to use whenever interpolation between latents is needed. We measure image quality using FID and CLIP embedding distance and show experimentally that baseline interpolation methods lead to a drop in quality metrics long before the degeneration issue is clearly visible. In contrast, our method significantly reduces the degeneration effect and leads to improved quality metrics also in non-degenerate situations.