Disentangled Image Generation Through Structured Noise Injection
This addresses the challenge of controlling specific image attributes in generative models for applications like facial expression editing, but it is incremental as it builds on existing GAN frameworks.
The paper tackles the problem of disentangling latent spaces in GANs for image generation by proposing structured noise injection through multiple separate layers, achieving spatial, scale-space, and foreground-background disentanglement without labels. The result is improved disentanglement scores on the FFHQ dataset compared to state-of-the-art methods.
We explore different design choices for injecting noise into generative adversarial networks (GANs) with the goal of disentangling the latent space. Instead of traditional approaches, we propose feeding multiple noise codes through separate fully-connected layers respectively. The aim is restricting the influence of each noise code to specific parts of the generated image. We show that disentanglement in the first layer of the generator network leads to disentanglement in the generated image. Through a grid-based structure, we achieve several aspects of disentanglement without complicating the network architecture and without requiring labels. We achieve spatial disentanglement, scale-space disentanglement, and disentanglement of the foreground object from the background style allowing fine-grained control over the generated images. Examples include changing facial expressions in face images, changing beak length in bird images, and changing car dimensions in car images. This empirically leads to better disentanglement scores than state-of-the-art methods on the FFHQ dataset.