CVMar 21, 2023

MV-MR: multi-views and multi-representations for self-supervised learning and knowledge distillation

Vitaliy Kinakh, Mariia Drozdova, Slava Voloshynovskiy

arXiv:2303.12130v22.84 citationsh-index: 15Has Code

Originality Incremental advance

AI Analysis

This work addresses the need for efficient self-supervised learning and model-agnostic knowledge distillation without relying on contrastive learning or clustering, offering a generic framework for regularization, which is incremental as it builds on existing multi-view and representation ideas.

The authors tackled the problem of self-supervised learning and knowledge distillation by introducing a method based on multi-views and multi-representations, which maximizes dependence between embeddings from augmented and non-augmented views, achieving state-of-the-art performance on STL10 and ImageNet-1K datasets among non-contrastive and clustering-free methods, with a ResNet50 model pretrained using their distillation method reaching top results on STL10 linear evaluation.

We present a new method of self-supervised learning and knowledge distillation based on the multi-views and multi-representations (MV-MR). The MV-MR is based on the maximization of dependence between learnable embeddings from augmented and non-augmented views, jointly with the maximization of dependence between learnable embeddings from augmented view and multiple non-learnable representations from non-augmented view. We show that the proposed method can be used for efficient self-supervised classification and model-agnostic knowledge distillation. Unlike other self-supervised techniques, our approach does not use any contrastive learning, clustering, or stop gradients. MV-MR is a generic framework allowing the incorporation of constraints on the learnable embeddings via the usage of image multi-representations as regularizers. Along this line, knowledge distillation is considered a particular case of such a regularization. MV-MR provides the state-of-the-art performance on the STL10 and ImageNet-1K datasets among non-contrastive and clustering-free methods. We show that a lower complexity ResNet50 model pretrained using proposed knowledge distillation based on the CLIP ViT model achieves state-of-the-art performance on STL10 linear evaluation. The code is available at: https://github.com/vkinakh/mv-mr

View on arXiv PDF Code

Similar