CVAIApr 8, 2025

TMT: Cross-domain Semantic Segmentation with Region-adaptive Transferability Estimation

arXiv:2504.05774v3h-index: 32
Originality Incremental advance
AI Analysis

This work addresses domain adaptation challenges in semantic segmentation for computer vision applications, representing an incremental improvement over prior methods.

The paper tackles the problem of adapting Vision Transformers for semantic segmentation across different domains by addressing spatially varying transferability, and the proposed TMT framework outperforms existing methods in 20 cross-domain settings.

Recent advances in Vision Transformers (ViTs) have significantly advanced semantic segmentation performance. However, their adaptation to new target domains remains challenged by distribution shifts, which often disrupt global attention mechanisms. While existing global and patch-level adaptation methods offer some improvements, they overlook the spatially varying transferability inherent in different image regions. To address this, we propose the Transferable Mask Transformer (TMT), a region-adaptive framework designed to enhance cross-domain representation learning through transferability guidance. First, we dynamically partition the image into coherent regions, grouped by structural and semantic similarity, and estimates their domain transferability at a localized level. Then, we incorporate region-level transferability maps directly into the self-attention mechanism of ViTs, allowing the model to adaptively focus attention on areas with lower transferability and higher semantic uncertainty. Extensive experiments across 20 diverse cross-domain settings demonstrate that TMT not only mitigates the performance degradation typically associated with domain shift but also consistently outperforms existing approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes