CVNov 25, 2021

Semantic-Aware Generation for Self-Supervised Visual Representation Learning

arXiv:2111.13163v119 citations
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in self-supervised learning for computer vision, offering an incremental improvement by enhancing semantic preservation during generation.

The paper tackles the problem of semantic degradation in self-supervised visual representation learning by proposing Semantic-aware Generation (SaGe), which uses a pre-trained evaluator to focus on semantic rather than pixel-level similarity in image generation, resulting in stronger visual representations as demonstrated on ImageNet-1K and five downstream tasks.

In this paper, we propose a self-supervised visual representation learning approach which involves both generative and discriminative proxies, where we focus on the former part by requiring the target network to recover the original image based on the mid-level features. Different from prior work that mostly focuses on pixel-level similarity between the original and generated images, we advocate for Semantic-aware Generation (SaGe) to facilitate richer semantics rather than details to be preserved in the generated image. The core idea of implementing SaGe is to use an evaluator, a deep network that is pre-trained without labels, for extracting semantic-aware features. SaGe complements the target network with view-specific features and thus alleviates the semantic degradation brought by intensive data augmentations. We execute SaGe on ImageNet-1K and evaluate the pre-trained models on five downstream tasks including nearest neighbor test, linear classification, and fine-scaled image recognition, demonstrating its ability to learn stronger visual representations.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes