CVAILGROApr 24, 2023

Augmentation-based Domain Generalization for Semantic Segmentation

arXiv:2304.12122v115 citationsh-index: 20
Originality Incremental advance
AI Analysis

This work addresses the lack of generalization in deep neural networks for unseen domains, offering a simpler and more efficient approach for researchers and practitioners in computer vision, though it is incremental as it builds on existing augmentation techniques.

The paper tackles the problem of domain generalization in semantic segmentation by systematically evaluating simple image augmentations, finding that combined augmentations perform competitively with state-of-the-art methods, achieving 39.5% mIoU on Synthia to Cityscapes and 44.2% mIoU with DAFormer.

Unsupervised Domain Adaptation (UDA) and domain generalization (DG) are two research areas that aim to tackle the lack of generalization of Deep Neural Networks (DNNs) towards unseen domains. While UDA methods have access to unlabeled target images, domain generalization does not involve any target data and only learns generalized features from a source domain. Image-style randomization or augmentation is a popular approach to improve network generalization without access to the target domain. Complex methods are often proposed that disregard the potential of simple image augmentations for out-of-domain generalization. For this reason, we systematically study the in- and out-of-domain generalization capabilities of simple, rule-based image augmentations like blur, noise, color jitter and many more. Based on a full factorial design of experiment design we provide a systematic statistical evaluation of augmentations and their interactions. Our analysis provides both, expected and unexpected, outcomes. Expected, because our experiments confirm the common scientific standard that combination of multiple different augmentations out-performs single augmentations. Unexpected, because combined augmentations perform competitive to state-of-the-art domain generalization approaches, while being significantly simpler and without training overhead. On the challenging synthetic-to-real domain shift between Synthia and Cityscapes we reach 39.5% mIoU compared to 40.9% mIoU of the best previous work. When additionally employing the recent vision transformer architecture DAFormer we outperform these benchmarks with a performance of 44.2% mIoU

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes