CVFeb 24, 2022

Towards Unsupervised Domain Adaptation via Domain-Transformer

arXiv:2202.13777v218 citations
Originality Incremental advance
AI Analysis

This work addresses domain adaptation for machine learning applications where labeled data is scarce, offering a novel method that is plug-and-play and interpretable, though it appears incremental as it builds on existing transformer-based approaches.

The paper tackles the problem of Unsupervised Domain Adaptation (UDA) by proposing the Domain-Transformer (DoT) with a domain-level attention mechanism to capture long-range correspondences between cross-domain samples, achieving improved performance on benchmark datasets without requiring pseudo-labels or explicit domain discrepancy optimization.

As a vital problem in pattern analysis and machine intelligence, Unsupervised Domain Adaptation (UDA) attempts to transfer an effective feature learner from a labeled source domain to an unlabeled target domain. Inspired by the success of the Transformer, several advances in UDA are achieved by adopting pure transformers as network architectures, but such a simple application can only capture patch-level information and lacks interpretability. To address these issues, we propose the Domain-Transformer (DoT) with domain-level attention mechanism to capture the long-range correspondence between the cross-domain samples. On the theoretical side, we provide a mathematical understanding of DoT: 1) We connect the domain-level attention with optimal transport theory, which provides interpretability from Wasserstein geometry; 2) From the perspective of learning theory, Wasserstein distance-based generalization bounds are derived, which explains the effectiveness of DoT for knowledge transfer. On the methodological side, DoT integrates the domain-level attention and manifold structure regularization, which characterize the sample-level information and locality consistency for cross-domain cluster structures. Besides, the domain-level attention mechanism can be used as a plug-and-play module, so DoT can be implemented under different neural network architectures. Instead of explicitly modeling the distribution discrepancy at domain-level or class-level, DoT learns transferable features under the guidance of long-range correspondence, so it is free of pseudo-labels and explicit domain discrepancy optimization. Extensive experiment results on several benchmark datasets validate the effectiveness of DoT.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes