Deciphering the Projection Head: Representation Evaluation Self-supervised Learning
This work addresses the problem of optimizing representation learning in self-supervised learning for improved downstream task performance and robustness, representing an incremental improvement.
The paper systematically investigates the role of the projection head in self-supervised learning, finding it targets uniformity to push dissimilar samples apart, and proposes a Representation Evaluation Design (RED) that builds a shortcut connection between representation and projection vectors, consistently improving baseline models like SimCLR, MoCo-V2, and SimSiam on downstream tasks with enhanced robustness to unseen augmentations and out-of-distribution data.
Self-supervised learning (SSL) aims to learn intrinsic features without labels. Despite the diverse architectures of SSL methods, the projection head always plays an important role in improving the performance of the downstream task. In this work, we systematically investigate the role of the projection head in SSL. Specifically, the projection head targets the uniformity part of SSL, which pushes the dissimilar samples away from each other, thus enabling the encoder to focus on extracting semantic features. Based on this understanding, we propose a Representation Evaluation Design (RED) in SSL models in which a shortcut connection between the representation and the projection vectors is built. Extensive experiments with different architectures, including SimCLR, MoCo-V2, and SimSiam, on various datasets, demonstrate that the representation evaluation design can consistently improve the baseline models in the downstream tasks. The learned representation from the RED-SSL models shows superior robustness to unseen augmentations and out-of-distribution data.