Semi-Supervised Semantic Segmentation with Cross-Consistency Training
This work addresses the challenge of leveraging unlabeled data for semantic segmentation, which is important for applications like autonomous driving and medical imaging, though it is incremental as it builds on existing consistency training frameworks.
The paper tackles the problem of semi-supervised semantic segmentation by proposing cross-consistency training, which enforces prediction invariance under perturbations to encoder outputs, achieving state-of-the-art results on multiple datasets.
In this paper, we present a novel cross-consistency based semi-supervised approach for semantic segmentation. Consistency training has proven to be a powerful semi-supervised learning framework for leveraging unlabeled data under the cluster assumption, in which the decision boundary should lie in low-density regions. In this work, we first observe that for semantic segmentation, the low-density regions are more apparent within the hidden representations than within the inputs. We thus propose cross-consistency training, where an invariance of the predictions is enforced over different perturbations applied to the outputs of the encoder. Concretely, a shared encoder and a main decoder are trained in a supervised manner using the available labeled examples. To leverage the unlabeled examples, we enforce a consistency between the main decoder predictions and those of the auxiliary decoders, taking as inputs different perturbed versions of the encoder's output, and consequently, improving the encoder's representations. The proposed method is simple and can easily be extended to use additional training signal, such as image-level labels or pixel-level labels across different domains. We perform an ablation study to tease apart the effectiveness of each component, and conduct extensive experiments to demonstrate that our method achieves state-of-the-art results in several datasets.