CVFeb 24, 2022

Towards Unsupervised Domain Adaptation via Domain-Transformer

Ren Chuan-Xian, Zhai Yi-Ming, Luo You-Wei, Yan Hong

arXiv:2202.13777v218 citations

Originality Incremental advance

AI Analysis

This work addresses domain adaptation for machine learning applications where labeled data is scarce, offering a novel method that is plug-and-play and interpretable, though it appears incremental as it builds on existing transformer-based approaches.

The paper tackles the problem of Unsupervised Domain Adaptation (UDA) by proposing the Domain-Transformer (DoT) with a domain-level attention mechanism to capture long-range correspondences between cross-domain samples, achieving improved performance on benchmark datasets without requiring pseudo-labels or explicit domain discrepancy optimization.

As a vital problem in pattern analysis and machine intelligence, Unsupervised Domain Adaptation (UDA) attempts to transfer an effective feature learner from a labeled source domain to an unlabeled target domain. Inspired by the success of the Transformer, several advances in UDA are achieved by adopting pure transformers as network architectures, but such a simple application can only capture patch-level information and lacks interpretability. To address these issues, we propose the Domain-Transformer (DoT) with domain-level attention mechanism to capture the long-range correspondence between the cross-domain samples. On the theoretical side, we provide a mathematical understanding of DoT: 1) We connect the domain-level attention with optimal transport theory, which provides interpretability from Wasserstein geometry; 2) From the perspective of learning theory, Wasserstein distance-based generalization bounds are derived, which explains the effectiveness of DoT for knowledge transfer. On the methodological side, DoT integrates the domain-level attention and manifold structure regularization, which characterize the sample-level information and locality consistency for cross-domain cluster structures. Besides, the domain-level attention mechanism can be used as a plug-and-play module, so DoT can be implemented under different neural network architectures. Instead of explicitly modeling the distribution discrepancy at domain-level or class-level, DoT learns transferable features under the guidance of long-range correspondence, so it is free of pseudo-labels and explicit domain discrepancy optimization. Extensive experiment results on several benchmark datasets validate the effectiveness of DoT.

View on arXiv PDF

Similar