Pixel-global Self-supervised Learning with Uncertainty-aware Context Stabilizer
This work addresses the need for robust self-supervised learning methods that can handle both discriminative and dense predictive tasks, though it is incremental as it builds on existing teacher-student architectures.
The paper tackles the problem of capturing both global and pixel-level consistencies in self-supervised learning for downstream tasks by introducing an uncertainty-aware context stabilizer with Monte Carlo dropout, achieving improved performance on benchmarks like ImageNet and COCO.
We developed a novel SSL approach to capture global consistency and pixel-level local consistencies between differently augmented views of the same images to accommodate downstream discriminative and dense predictive tasks. We adopted the teacher-student architecture used in previous contrastive SSL methods. In our method, the global consistency is enforced by aggregating the compressed representations of augmented views of the same image. The pixel-level consistency is enforced by pursuing similar representations for the same pixel in differently augmented views. Importantly, we introduced an uncertainty-aware context stabilizer to adaptively preserve the context gap created by the two views from different augmentations. Moreover, we used Monte Carlo dropout in the stabilizer to measure uncertainty and adaptively balance the discrepancy between the representations of the same pixels in different views.