CVJan 23, 2024

UniHDA: A Unified and Versatile Framework for Multi-Modal Hybrid Domain Adaptation

arXiv:2401.12596v2h-index: 17
AI Analysis

This work addresses a domain-specific problem for researchers in generative AI by enabling hybrid domain adaptation across modalities, though it is incremental in building on existing methods.

The paper tackles the problem of adapting pre-trained generative models to multiple target domains using multi-modal references, achieving realistic image synthesis with various attribute compositions.

Recently, generative domain adaptation has achieved remarkable progress, enabling us to adapt a pre-trained generator to a new target domain. However, existing methods simply adapt the generator to a single target domain and are limited to a single modality, either text-driven or image-driven. Moreover, they cannot maintain well consistency with the source domain, which impedes the inheritance of the diversity. In this paper, we propose UniHDA, a \textbf{unified} and \textbf{versatile} framework for generative hybrid domain adaptation with multi-modal references from multiple domains. We use CLIP encoder to project multi-modal references into a unified embedding space and then linearly interpolate the direction vectors from multiple target domains to achieve hybrid domain adaptation. To ensure \textbf{consistency} with the source domain, we propose a novel cross-domain spatial structure (CSS) loss that maintains detailed spatial structure information between source and target generator. Experiments show that the adapted generator can synthesise realistic images with various attribute compositions. Additionally, our framework is generator-agnostic and versatile to multiple generators, e.g., StyleGAN, EG3D, and Diffusion Models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes