Controlling generative models with continuous factors of variations
This addresses the problem of limited usability in generative AI for researchers and practitioners by enabling fine-grained manipulation without human annotations, though it is incremental as it builds on prior work on latent space semantics.
The paper tackles the lack of control and interpretability in generative models by introducing a method to find meaningful directions in the latent space for precise control over image properties like position, scale, and color, demonstrating effectiveness for GANs and VAEs.
Recent deep generative models are able to provide photo-realistic images as well as visual or textual content embeddings useful to address various tasks of computer vision and natural language processing. Their usefulness is nevertheless often limited by the lack of control over the generative process or the poor understanding of the learned representation. To overcome these major issues, very recent work has shown the interest of studying the semantics of the latent space of generative models. In this paper, we propose to advance on the interpretability of the latent space of generative models by introducing a new method to find meaningful directions in the latent space of any generative model along which we can move to control precisely specific properties of the generated image like the position or scale of the object in the image. Our method does not require human annotations and is particularly well suited for the search of directions encoding simple transformations of the generated image, such as translation, zoom or color variations. We demonstrate the effectiveness of our method qualitatively and quantitatively, both for GANs and variational auto-encoders.