Unsupervised Image-to-Image Translation Using Domain-Specific Variational Information Bound
This addresses the limitation of existing methods that fail to model domain-specific information in cross-modal translation, offering a solution for generating diverse outputs in tasks like style transfer or medical imaging.
The paper tackles the problem of ambiguity in unsupervised image-to-image translation when source and target domains have different modalities, by proposing a framework that maximizes a domain-specific variational information bound, enabling mapping a single source image into multiple target images using sampled or reference codes.
Unsupervised image-to-image translation is a class of computer vision problems which aims at modeling conditional distribution of images in the target domain, given a set of unpaired images in the source and target domains. An image in the source domain might have multiple representations in the target domain. Therefore, ambiguity in modeling of the conditional distribution arises, specially when the images in the source and target domains come from different modalities. Current approaches mostly rely on simplifying assumptions to map both domains into a shared-latent space. Consequently, they are only able to model the domain-invariant information between the two modalities. These approaches usually fail to model domain-specific information which has no representation in the target domain. In this work, we propose an unsupervised image-to-image translation framework which maximizes a domain-specific variational information bound and learns the target domain-invariant representation of the two domain. The proposed framework makes it possible to map a single source image into multiple images in the target domain, utilizing several target domain-specific codes sampled randomly from the prior distribution, or extracted from reference images.