CVOct 8, 2023

Enhancing Representations through Heterogeneous Self-Supervised Learning

arXiv:2310.05108v43 citationsh-index: 13Has Code
Originality Incremental advance
AI Analysis

This work addresses a problem in computer vision by enhancing self-supervised learning through heterogeneous architectures, offering incremental improvements in representation quality for tasks such as segmentation and detection.

The paper tackles the under-exploited complementarity between heterogeneous architectures in self-supervised learning by proposing Heterogeneous Self-Supervised Learning (HSSL), which enforces a base model to learn from an auxiliary head with a different architecture, resulting in improved representation quality and superior performance on downstream tasks like image classification and object detection.

Incorporating heterogeneous representations from different architectures has facilitated various vision tasks, e.g., some hybrid networks combine transformers and convolutions. However, complementarity between such heterogeneous architectures has not been well exploited in self-supervised learning. Thus, we propose Heterogeneous Self-Supervised Learning (HSSL), which enforces a base model to learn from an auxiliary head whose architecture is heterogeneous from the base model. In this process, HSSL endows the base model with new characteristics in a representation learning way without structural changes. To comprehensively understand the HSSL, we conduct experiments on various heterogeneous pairs containing a base model and an auxiliary head. We discover that the representation quality of the base model moves up as their architecture discrepancy grows. This observation motivates us to propose a search strategy that quickly determines the most suitable auxiliary head for a specific base model to learn and several simple but effective methods to enlarge the model discrepancy. The HSSL is compatible with various self-supervised methods, achieving superior performances on various downstream tasks, including image classification, semantic segmentation, instance segmentation, and object detection. The codes are available at https://github.com/NK-JittorCV/Self-Supervised/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes