One-for-All: Towards Universal Domain Translation with a Single StyleGAN
This addresses the challenge of universal domain translation for computer vision applications, offering a versatile solution for tasks like style mixing and stylization, though it appears incremental as it builds on existing CLIP and StyleGAN frameworks.
The paper tackles the problem of translating images between visually distinct domains with limited training data by proposing UniTranslator, which leverages CLIP and a new CLIP2P mapper to bridge CLIP and StyleGAN spaces, resulting in high-quality translations that outperform existing general-purpose models and compete with specialized ones.
In this paper, we propose a novel translation model, UniTranslator, for transforming representations between visually distinct domains under conditions of limited training data and significant visual differences. The main idea behind our approach is leveraging the domain-neutral capabilities of CLIP as a bridging mechanism, while utilizing a separate module to extract abstract, domain-agnostic semantics from the embeddings of both the source and target realms. Fusing these abstract semantics with target-specific semantics results in a transformed embedding within the CLIP space. To bridge the gap between the disparate worlds of CLIP and StyleGAN, we introduce a new non-linear mapper, the CLIP2P mapper. Utilizing CLIP embeddings, this module is tailored to approximate the latent distribution in the StyleGAN's latent space, effectively acting as a connector between these two spaces. The proposed UniTranslator is versatile and capable of performing various tasks, including style mixing, stylization, and translations, even in visually challenging scenarios across different visual domains. Notably, UniTranslator generates high-quality translations that showcase domain relevance, diversity, and improved image quality. UniTranslator surpasses the performance of existing general-purpose models and performs well against specialized models in representative tasks. The source code and trained models will be released to the public.