Boosting Contrastive Learning with Relation Knowledge Distillation
This addresses the problem of semantic collapse in lightweight models for self-supervised learning, representing an incremental improvement in domain-specific applications.
The paper tackles the performance gap between self-supervised and supervised methods in lightweight models by proposing a relation-wise contrastive paradigm with Relation Knowledge Distillation, which improves linear evaluation on AlexNet from 44.7% to 50.1%, approaching supervised performance of 50.5%.
While self-supervised representation learning (SSL) has proved to be effective in the large model, there is still a huge gap between the SSL and supervised method in the lightweight model when following the same solution. We delve into this problem and find that the lightweight model is prone to collapse in semantic space when simply performing instance-wise contrast. To address this issue, we propose a relation-wise contrastive paradigm with Relation Knowledge Distillation (ReKD). We introduce a heterogeneous teacher to explicitly mine the semantic information and transferring a novel relation knowledge to the student (lightweight model). The theoretical analysis supports our main concern about instance-wise contrast and verify the effectiveness of our relation-wise contrastive learning. Extensive experimental results also demonstrate that our method achieves significant improvements on multiple lightweight models. Particularly, the linear evaluation on AlexNet obviously improves the current state-of-art from 44.7% to 50.1%, which is the first work to get close to the supervised 50.5%. Code will be made available.