Heterogeneous-Modal Unsupervised Domain Adaptation via Latent Space Bridging
This addresses a critical limitation in domain adaptation for scenarios with completely different modalities, such as transferring knowledge between images and text, which is incremental as it builds on existing UDA methods.
The paper tackles the problem of unsupervised domain adaptation when source and target domains are from entirely distinct modalities, proposing a novel setting called Heterogeneous-Modal Unsupervised Domain Adaptation (HMUDA) and a framework called Latent Space Bridging (LSB) for semantic segmentation, achieving state-of-the-art performance on six benchmark datasets.
Unsupervised domain adaptation (UDA) methods effectively bridge domain gaps but become struggled when the source and target domains belong to entirely distinct modalities. To address this limitation, we propose a novel setting called Heterogeneous-Modal Unsupervised Domain Adaptation (HMUDA), which enables knowledge transfer between completely different modalities by leveraging a bridge domain containing unlabeled samples from both modalities. To learn under the HMUDA setting, we propose Latent Space Bridging (LSB), a specialized framework designed for the semantic segmentation task. Specifically, LSB utilizes a dual-branch architecture, incorporating a feature consistency loss to align representations across modalities and a domain alignment loss to reduce discrepancies between class centroids across domains. Extensive experiments conducted on six benchmark datasets demonstrate that LSB achieves state-of-the-art performance.