CVAug 25, 2023

Self-Supervised Representation Learning with Cross-Context Learning between Global and Hypercolumn Features

arXiv:2308.13392v26 citationsh-index: 10
AI Analysis

This work addresses a problem in self-supervised representation learning for computer vision researchers, offering an incremental improvement by extending existing frameworks with cross-context learning.

The paper tackles the limitation of contrastive learning in capturing similarities between different instances by proposing a self-supervised framework that enforces consistency between low- and high-level semantics through cross-context learning between global and hypercolumn features, achieving state-of-the-art performance on linear classification and downstream tasks.

Whilst contrastive learning yields powerful representations by matching different augmented views of the same instance, it lacks the ability to capture the similarities between different instances. One popular way to address this limitation is by learning global features (after the global pooling) to capture inter-instance relationships based on knowledge distillation, where the global features of the teacher are used to guide the learning of the global features of the student. Inspired by cross-modality learning, we extend this existing framework that only learns from global features by encouraging the global features and intermediate layer features to learn from each other. This leads to our novel self-supervised framework: cross-context learning between global and hypercolumn features (CGH), that enforces the consistency of instance relations between low- and high-level semantics. Specifically, we stack the intermediate feature maps to construct a hypercolumn representation so that we can measure instance relations using two contexts (hypercolumn and global feature) separately, and then use the relations of one context to guide the learning of the other. This cross-context learning allows the model to learn from the differences between the two contexts. The experimental results on linear classification and downstream tasks show that our method outperforms the state-of-the-art methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes