Exploiting Image Translations via Ensemble Self-Supervised Learning for Unsupervised Domain Adaptation
This work addresses the problem of adapting models from synthetic to real-world data for semantic segmentation, which is incremental as it builds on existing UDA methods with ensemble and self-supervised techniques.
The paper tackles unsupervised domain adaptation for semantic segmentation by combining multiple image translations, ensemble learning, and self-supervised learning, achieving state-of-the-art results on benchmarks like GTA V and Synthia to Cityscapes with improved mean intersection over union metrics.
We introduce an unsupervised domain adaption (UDA) strategy that combines multiple image translations, ensemble learning and self-supervised learning in one coherent approach. We focus on one of the standard tasks of UDA in which a semantic segmentation model is trained on labeled synthetic data together with unlabeled real-world data, aiming to perform well on the latter. To exploit the advantage of using multiple image translations, we propose an ensemble learning approach, where three classifiers calculate their prediction by taking as input features of different image translations, making each classifier learn independently, with the purpose of combining their outputs by sparse Multinomial Logistic Regression. This regression layer known as meta-learner helps to reduce the bias during pseudo label generation when performing self-supervised learning and improves the generalizability of the model by taking into consideration the contribution of each classifier. We evaluate our method on the standard UDA benchmarks, i.e. adapting GTA V and Synthia to Cityscapes, and achieve state-of-the-art results in the mean intersection over union metric. Extensive ablation experiments are reported to highlight the advantageous properties of our proposed UDA strategy.