MISO: Mutual Information Loss with Stochastic Style Representations for Multimodal Image-to-Image Translation
This addresses the limitation of one-to-one mapping in image translation for applications like computer vision, though it is incremental as it builds on existing disentangled representation methods.
The paper tackles the problem of unpaired multimodal image-to-image translation by proposing a model that uses content and style representations with a new mutual information loss, achieving state-of-the-art performance on various real-world datasets.
Unpaired multimodal image-to-image translation is a task of translating a given image in a source domain into diverse images in the target domain, overcoming the limitation of one-to-one mapping. Existing multimodal translation models are mainly based on the disentangled representations with an image reconstruction loss. We propose two approaches to improve multimodal translation quality. First, we use a content representation from the source domain conditioned on a style representation from the target domain. Second, rather than using a typical image reconstruction loss, we design MILO (Mutual Information LOss), a new stochastically-defined loss function based on information theory. This loss function directly reflects the interpretation of latent variables as a random variable. We show that our proposed model Mutual Information with StOchastic Style Representation(MISO) achieves state-of-the-art performance through extensive experiments on various real-world datasets.