CVNov 15, 2024

One Leaf Reveals the Season: Occlusion-Based Contrastive Learning with Semantic-Aware Views for Efficient Visual Representation

arXiv:2411.09858v214 citationsh-index: 6Has CodeICML
Originality Incremental advance
AI Analysis

This addresses the need for scalable and efficient pre-training methods in computer vision, though it appears incremental as it builds on existing contrastive learning and masking techniques.

The paper tackles the problem of efficient visual representation learning by proposing occluded image contrastive learning (OCL), which uses random masking to create semantic-aware views and contrastive learning to extract high-level features, achieving 85.8% accuracy in downstream tasks with ViT-L/16 pre-trained in 133 hours on 4 A100 GPUs.

This paper proposes a scalable and straightforward pre-training paradigm for efficient visual conceptual representation called occluded image contrastive learning (OCL). Our OCL approach is simple: we randomly mask patches to generate different views within an image and contrast them among a mini-batch of images. The core idea behind OCL consists of two designs. First, masked tokens have the potential to significantly diminish the conceptual redundancy inherent in images, and create distinct views with substantial fine-grained differences on the semantic concept level instead of the instance level. Second, contrastive learning is adept at extracting high-level semantic conceptual features during the pre-training, circumventing the high-frequency interference and additional costs associated with image reconstruction. Importantly, OCL learns highly semantic conceptual representations efficiently without relying on hand-crafted data augmentations or additional auxiliary modules. Empirically, OCL demonstrates high scalability with Vision Transformers, as the ViT-L/16 can complete pre-training in 133 hours using only 4 A100 GPUs, achieving 85.8\% accuracy in downstream fine-tuning tasks. Code is available at https://anonymous.4open.science/r/OLRS/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes