CVApr 29, 2023

Regularizing Self-training for Unsupervised Domain Adaptation via Structural Constraints

CMUDeepMind
arXiv:2305.00131v11 citationsh-index: 27
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in domain adaptation for semantic segmentation, offering an incremental enhancement to existing methods.

The paper tackles the problem of erroneous pseudo-labels in self-training for unsupervised domain adaptation in semantic segmentation by incorporating structural cues from depth maps to regularize the training. The result is a significant improvement of up to 2 points in performance on various benchmarks.

Self-training based on pseudo-labels has emerged as a dominant approach for addressing conditional distribution shifts in unsupervised domain adaptation (UDA) for semantic segmentation problems. A notable drawback, however, is that this family of approaches is susceptible to erroneous pseudo labels that arise from confirmation biases in the source domain and that manifest as nuisance factors in the target domain. A possible source for this mismatch is the reliance on only photometric cues provided by RGB image inputs, which may ultimately lead to sub-optimal adaptation. To mitigate the effect of mismatched pseudo-labels, we propose to incorporate structural cues from auxiliary modalities, such as depth, to regularise conventional self-training objectives. Specifically, we introduce a contrastive pixel-level objectness constraint that pulls the pixel representations within a region of an object instance closer, while pushing those from different object categories apart. To obtain object regions consistent with the true underlying object, we extract information from both depth maps and RGB-images in the form of multimodal clustering. Crucially, the objectness constraint is agnostic to the ground-truth semantic labels and, hence, appropriate for unsupervised domain adaptation. In this work, we show that our regularizer significantly improves top performing self-training methods (by up to $2$ points) in various UDA benchmarks for semantic segmentation. We include all code in the supplementary.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes