CVLGJun 11, 2020

Rethinking the Truly Unsupervised Image-to-Image Translation

arXiv:2006.06500v2117 citations
Originality Highly original
AI Analysis

This work addresses the bottleneck of data collection in image-to-image translation by enabling fully unsupervised learning, which is incremental as it builds on existing translation models but removes supervision requirements.

The paper tackles the problem of image-to-image translation without requiring any supervision, such as paired images or domain labels, by proposing a model that simultaneously learns to separate image domains and translate images into these estimated domains, achieving comparable or better performance than supervised models on various datasets.

Every recent image-to-image translation model inherently requires either image-level (i.e. input-output pairs) or set-level (i.e. domain labels) supervision. However, even set-level supervision can be a severe bottleneck for data collection in practice. In this paper, we tackle image-to-image translation in a fully unsupervised setting, i.e., neither paired images nor domain labels. To this end, we propose a truly unsupervised image-to-image translation model (TUNIT) that simultaneously learns to separate image domains and translates input images into the estimated domains. Experimental results show that our model achieves comparable or even better performance than the set-level supervised model trained with full labels, generalizes well on various datasets, and is robust against the choice of hyperparameters (e.g. the preset number of pseudo domains). Furthermore, TUNIT can be easily extended to semi-supervised learning with a few labeled data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes