LGAICVJan 18, 2024

Towards Identifiable Unsupervised Domain Translation: A Diversified Distribution Matching Approach

arXiv:2401.09671v37 citationsICLR
Originality Highly original
AI Analysis

This work addresses a core limitation in domain translation for computer vision applications, offering a novel solution to ensure content-aligned translations, though it is incremental in building on distribution matching approaches.

The paper tackles the identifiability problem in unsupervised domain translation, where existing methods like CycleGAN can fail due to multiple possible translation functions, and introduces a theory and framework that matches diverse cross-domain conditional distributions to eliminate these ambiguities, achieving improved translation accuracy with experimental validation.

Unsupervised domain translation (UDT) aims to find functions that convert samples from one domain (e.g., sketches) to another domain (e.g., photos) without changing the high-level semantic meaning (also referred to as ``content''). The translation functions are often sought by probability distribution matching of the transformed source domain and target domain. CycleGAN stands as arguably the most representative approach among this line of work. However, it was noticed in the literature that CycleGAN and variants could fail to identify the desired translation functions and produce content-misaligned translations. This limitation arises due to the presence of multiple translation functions -- referred to as ``measure-preserving automorphism" (MPA) -- in the solution space of the learning criteria. Despite awareness of such identifiability issues, solutions have remained elusive. This study delves into the core identifiability inquiry and introduces an MPA elimination theory. Our analysis shows that MPA is unlikely to exist, if multiple pairs of diverse cross-domain conditional distributions are matched by the learning function. Our theory leads to a UDT learner using distribution matching over auxiliary variable-induced subsets of the domains -- other than over the entire data domains as in the classical approaches. The proposed framework is the first to rigorously establish translation identifiability under reasonable UDT settings, to our best knowledge. Experiments corroborate with our theoretical claims.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes