CVMar 15, 2020

GMM-UNIT: Unsupervised Multi-Domain and Multi-Modal Image-to-Image Translation via Attribute Gaussian Mixture Modeling

arXiv:2003.06788v229 citations
AI Analysis

This work provides a unifying framework for unsupervised image-to-image translation, potentially benefiting researchers and practitioners in computer vision by improving flexibility and diversity in domain translation tasks.

The paper tackles the problem of unsupervised multi-domain and multi-modal image-to-image translation by proposing GMM-UNIT, which uses a Gaussian mixture model to represent domains in a disentangled attribute space, enabling interpolation and extrapolation to unseen domains while addressing mode collapse.

Unsupervised image-to-image translation (UNIT) aims at learning a mapping between several visual domains by using unpaired training images. Recent studies have shown remarkable success for multiple domains but they suffer from two main limitations: they are either built from several two-domain mappings that are required to be learned independently, or they generate low-diversity results, a problem known as mode collapse. To overcome these limitations, we propose a method named GMM-UNIT, which is based on a content-attribute disentangled representation where the attribute space is fitted with a GMM. Each GMM component represents a domain, and this simple assumption has two prominent advantages. First, it can be easily extended to most multi-domain and multi-modal image-to-image translation tasks. Second, the continuous domain encoding allows for interpolation between domains and for extrapolation to unseen domains and translations. Additionally, we show how GMM-UNIT can be constrained down to different methods in the literature, meaning that GMM-UNIT is a unifying framework for unsupervised image-to-image translation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes