Label-Driven Reconstruction for Domain Adaptation in Semantic Segmentation
This work addresses the problem of reducing annotation costs for semantic segmentation in domain adaptation, particularly for urban scene understanding, but it is incremental as it builds on existing translation and alignment strategies.
The paper tackles unsupervised domain adaptation for semantic segmentation by proposing a framework that reduces image translation bias and aligns cross-domain features by performing target-to-source translation and reconstructing images from predicted labels, achieving competitive results against state-of-the-art methods in synthetic-to-real urban scene adaptation.
Unsupervised domain adaptation enables to alleviate the need for pixel-wise annotation in the semantic segmentation. One of the most common strategies is to translate images from the source domain to the target domain and then align their marginal distributions in the feature space using adversarial learning. However, source-to-target translation enlarges the bias in translated images and introduces extra computations, owing to the dominant data size of the source domain. Furthermore, consistency of the joint distribution in source and target domains cannot be guaranteed through global feature alignment. Here, we present an innovative framework, designed to mitigate the image translation bias and align cross-domain features with the same category. This is achieved by 1) performing the target-to-source translation and 2) reconstructing both source and target images from their predicted labels. Extensive experiments on adapting from synthetic to real urban scene understanding demonstrate that our framework competes favorably against existing state-of-the-art methods.