MSVQ: Self-Supervised Learning with Multiple Sample Views and Queues
This work addresses a specific bottleneck in self-supervised learning for computer vision, offering an incremental improvement over existing methods.
The paper tackles the problem of false negative samples in contrastive self-supervised learning by proposing MSVQ, a framework that uses multiple sample views and queues to improve representation learning, achieving high effectiveness and efficiency on four benchmark image datasets.
Self-supervised methods based on contrastive learning have achieved great success in unsupervised visual representation learning. However, most methods under this framework suffer from the problem of false negative samples. Inspired by the mean shift for self-supervised learning, we propose a new simple framework, namely Multiple Sample Views and Queues (MSVQ). We jointly construct three soft labels on-the-fly by utilizing two complementary and symmetric approaches: multiple augmented positive views and two momentum encoders that generate various semantic features for negative samples. Two teacher networks perform similarity relationship calculations with negative samples and then transfer this knowledge to the student network. Let the student network mimic the similarity relationships between the samples, thus giving the student network a more flexible ability to identify false negative samples in the dataset. The classification results on four benchmark image datasets demonstrate the high effectiveness and efficiency of our approach compared to some classical methods. Source code and pretrained models are available \href{https://github.com/pc-cp/MSVQ}{here}.