The Last Mile to Supervised Performance: Semi-Supervised Domain Adaptation for Semantic Segmentation
This addresses the problem of reducing annotation effort for dense tasks like semantic segmentation in computer vision, offering a practical solution for domain adaptation with limited labeled data, though it is incremental as it builds on existing UDA and SSL methods.
The paper tackles the challenge of achieving supervised performance in semantic segmentation with minimal annotation cost by proposing a Semi-Supervised Domain Adaptation (SSDA) framework that combines consistency regularization, pixel contrastive learning, and self-training, showing that as few as 50 target labels can achieve near-supervised performance on benchmarks like GTA-to-Cityscapes.
Supervised deep learning requires massive labeled datasets, but obtaining annotations is not always easy or possible, especially for dense tasks like semantic segmentation. To overcome this issue, numerous works explore Unsupervised Domain Adaptation (UDA), which uses a labeled dataset from another domain (source), or Semi-Supervised Learning (SSL), which trains on a partially labeled set. Despite the success of UDA and SSL, reaching supervised performance at a low annotation cost remains a notoriously elusive goal. To address this, we study the promising setting of Semi-Supervised Domain Adaptation (SSDA). We propose a simple SSDA framework that combines consistency regularization, pixel contrastive learning, and self-training to effectively utilize a few target-domain labels. Our method outperforms prior art in the popular GTA-to-Cityscapes benchmark and shows that as little as 50 target labels can suffice to achieve near-supervised performance. Additional results on Synthia-to-Cityscapes, GTA-to-BDD and Synthia-to-BDD further demonstrate the effectiveness and practical utility of the method. Lastly, we find that existing UDA and SSL methods are not well-suited for the SSDA setting and discuss design patterns to adapt them.