CVMMMay 23, 2023

Source-Free Domain Adaptation for RGB-D Semantic Segmentation with Vision Transformers

arXiv:2305.14269v217 citations
Originality Incremental advance
AI Analysis

This addresses the problem of adapting semantic segmentation models to new domains without source data for researchers in multimodal AI, though it is incremental as it builds on existing vision transformer and domain adaptation methods.

The paper tackles source-free domain adaptation for RGB-D semantic segmentation by proposing MISFIT, a depth-aware vision transformer framework that injects depth data at multiple stages and uses style transfer and entropy minimization, achieving noticeable performance improvements over standard strategies.

With the increasing availability of depth sensors, multimodal frameworks that combine color information with depth data are gaining interest. However, ground truth data for semantic segmentation is burdensome to provide, thus making domain adaptation a significant research area. Yet most domain adaptation methods are not able to effectively handle multimodal data. Specifically, we address the challenging source-free domain adaptation setting where the adaptation is performed without reusing source data. We propose MISFIT: MultImodal Source-Free Information fusion Transformer, a depth-aware framework which injects depth data into a segmentation module based on vision transformers at multiple stages, namely at the input, feature and output levels. Color and depth style transfer helps early-stage domain alignment while re-wiring self-attention between modalities creates mixed features, allowing the extraction of better semantic content. Furthermore, a depth-based entropy minimization strategy is also proposed to adaptively weight regions at different distances. Our framework, which is also the first approach using RGB-D vision transformers for source-free semantic segmentation, shows noticeable performance improvements with respect to standard strategies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes