Adversarially Domain-adaptive Latent Diffusion for Unsupervised Semantic Segmentation
This work addresses the challenge of reducing annotation costs for semantic segmentation by adapting models from synthetic to real-world domains, representing an incremental improvement over existing methods.
The paper tackles the problem of unsupervised domain adaptation for semantic segmentation by introducing a latent diffusion model with inter-coder connections and adversarial learning, achieving mIoU scores of 74.4 and 67.2 on benchmark datasets.
Semantic segmentation requires extensive pixel-level annotation, motivating unsupervised domain adaptation (UDA) to transfer knowledge from labelled source domains to unlabelled or weakly labelled target domains. One of the most efficient strategies involves using synthetic datasets generated within controlled virtual environments, such as video games or traffic simulators, which can automatically generate pixel-level annotations. However, even when such datasets are available, learning a well-generalised representation that captures both domains remains challenging, owing to probabilistic and geometric discrepancies between the virtual world and real-world imagery. This work introduces a semantic segmentation method based on latent diffusion models, termed Inter-Coder Connected Latent Diffusion (ICCLD), alongside an unsupervised domain adaptation approach. The model employs an inter-coder connection to enhance contextual understanding and preserve fine details, while adversarial learning aligns latent feature distributions across domains during the latent diffusion process. Experiments on GTA5, Synthia, and Cityscapes demonstrate that ICCLD outperforms state-of-the-art UDA methods, achieving mIoU scores of 74.4 (GTA5$\rightarrow$Cityscapes) and 67.2 (Synthia$\rightarrow$Cityscapes).