CVMar 15, 2020

GMM-UNIT: Unsupervised Multi-Domain and Multi-Modal Image-to-Image Translation via Attribute Gaussian Mixture Modeling

Yahui Liu, Marco De Nadai, Jian Yao, Nicu Sebe, Bruno Lepri, Xavier Alameda-Pineda

arXiv:2003.06788v212.029 citationsh-index: 97Has Code

Originality Highly original

AI Analysis

This work provides a unifying framework for unsupervised image-to-image translation, potentially benefiting researchers and practitioners in computer vision by improving flexibility and diversity in domain translation tasks.

The paper tackles the problem of unsupervised multi-domain and multi-modal image-to-image translation by proposing GMM-UNIT, which uses a Gaussian mixture model to represent domains in a disentangled attribute space, enabling interpolation and extrapolation to unseen domains while addressing mode collapse.

Unsupervised image-to-image translation (UNIT) aims at learning a mapping between several visual domains by using unpaired training images. Recent studies have shown remarkable success for multiple domains but they suffer from two main limitations: they are either built from several two-domain mappings that are required to be learned independently, or they generate low-diversity results, a problem known as mode collapse. To overcome these limitations, we propose a method named GMM-UNIT, which is based on a content-attribute disentangled representation where the attribute space is fitted with a GMM. Each GMM component represents a domain, and this simple assumption has two prominent advantages. First, it can be easily extended to most multi-domain and multi-modal image-to-image translation tasks. Second, the continuous domain encoding allows for interpolation between domains and for extrapolation to unseen domains and translations. Additionally, we show how GMM-UNIT can be constrained down to different methods in the literature, meaning that GMM-UNIT is a unifying framework for unsupervised image-to-image translation.

View on arXiv PDF Code

Similar