CVJun 24, 2025

Progressive Modality Cooperation for Multi-Modality Domain Adaptation

arXiv:2506.19316v126 citationsh-index: 98IEEE Transactions on Image Processing
Originality Incremental advance
AI Analysis

This work addresses domain adaptation challenges in multi-modality settings, which is incremental as it builds on existing methods by introducing new modules for modality cooperation and generation.

The paper tackles multi-modality domain adaptation by proposing a Progressive Modality Cooperation (PMC) framework to transfer knowledge from source to target domains using multiple modalities like RGB and depth, achieving effectiveness across 11 datasets for visual recognition tasks.

In this work, we propose a new generic multi-modality domain adaptation framework called Progressive Modality Cooperation (PMC) to transfer the knowledge learned from the source domain to the target domain by exploiting multiple modality clues (\eg, RGB and depth) under the multi-modality domain adaptation (MMDA) and the more general multi-modality domain adaptation using privileged information (MMDA-PI) settings. Under the MMDA setting, the samples in both domains have all the modalities. In two newly proposed modules of our PMC, the multiple modalities are cooperated for selecting the reliable pseudo-labeled target samples, which captures the modality-specific information and modality-integrated information, respectively. Under the MMDA-PI setting, some modalities are missing in the target domain. Hence, to better exploit the multi-modality data in the source domain, we further propose the PMC with privileged information (PMC-PI) method by proposing a new multi-modality data generation (MMG) network. MMG generates the missing modalities in the target domain based on the source domain data by considering both domain distribution mismatch and semantics preservation, which are respectively achieved by using adversarial learning and conditioning on weighted pseudo semantics. Extensive experiments on three image datasets and eight video datasets for various multi-modality cross-domain visual recognition tasks under both MMDA and MMDA-PI settings clearly demonstrate the effectiveness of our proposed PMC framework.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes