CVAIMar 22, 2022

CP2: Copy-Paste Contrastive Pretraining for Semantic Segmentation

arXiv:2203.11709v242 citationsh-index: 134
AI Analysis

This addresses the need for better pretraining methods for semantic segmentation, offering a domain-specific improvement.

The paper tackles the problem of self-supervised contrastive learning neglecting pixel-level details for dense prediction tasks like semantic segmentation, proposing CP2, a pixel-wise contrastive learning method that achieves 78.6% mIoU with ResNet-50 and 79.5% with ViT-S on PASCAL VOC 2012.

Recent advances in self-supervised contrastive learning yield good image-level representation, which favors classification tasks but usually neglects pixel-level detailed information, leading to unsatisfactory transfer performance to dense prediction tasks such as semantic segmentation. In this work, we propose a pixel-wise contrastive learning method called CP2 (Copy-Paste Contrastive Pretraining), which facilitates both image- and pixel-level representation learning and therefore is more suitable for downstream dense prediction tasks. In detail, we copy-paste a random crop from an image (the foreground) onto different background images and pretrain a semantic segmentation model with the objective of 1) distinguishing the foreground pixels from the background pixels, and 2) identifying the composed images that share the same foreground.Experiments show the strong performance of CP2 in downstream semantic segmentation: By finetuning CP2 pretrained models on PASCAL VOC 2012, we obtain 78.6% mIoU with a ResNet-50 and 79.5% with a ViT-S.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes