CVSep 8, 2022

Cross-Modal Knowledge Transfer Without Task-Relevant Source Data

arXiv:2209.04027v122 citationsh-index: 63
Originality Incremental advance
AI Analysis

This addresses the challenge of building computer vision systems for depth and infrared data where labeled datasets are scarce, but it is incremental as it builds on existing source-free transfer methods by specifically handling modality gaps.

The paper tackles the problem of transferring knowledge from a source modality (e.g., RGB) to a target modality (e.g., depth or infrared) without access to task-relevant source data, and introduces SOCKET, a framework that reduces the modality gap using paired task-irrelevant data and feature statistics matching, achieving significant performance improvements over existing source-free methods.

Cost-effective depth and infrared sensors as alternatives to usual RGB sensors are now a reality, and have some advantages over RGB in domains like autonomous navigation and remote sensing. As such, building computer vision and deep learning systems for depth and infrared data are crucial. However, large labeled datasets for these modalities are still lacking. In such cases, transferring knowledge from a neural network trained on a well-labeled large dataset in the source modality (RGB) to a neural network that works on a target modality (depth, infrared, etc.) is of great value. For reasons like memory and privacy, it may not be possible to access the source data, and knowledge transfer needs to work with only the source models. We describe an effective solution, SOCKET: SOurce-free Cross-modal KnowledgE Transfer for this challenging task of transferring knowledge from one source modality to a different target modality without access to task-relevant source data. The framework reduces the modality gap using paired task-irrelevant data, as well as by matching the mean and variance of the target features with the batch-norm statistics that are present in the source models. We show through extensive experiments that our method significantly outperforms existing source-free methods for classification tasks which do not account for the modality gap.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes