CVAIMay 10, 2022

CoDo: Contrastive Learning with Downstream Background Invariance for Detection

arXiv:2205.04617v16 citationsh-index: 5
Originality Highly original
AI Analysis

This addresses the performance gap in self-supervised learning for object detection, which is an incremental improvement over prior methods.

The paper tackled the problem of degraded transfer performance of self-supervised learning from image-level tasks to object detection by proposing CoDo, a novel object-level method that focuses on background invariance, resulting in strong transfer learning results on MSCOCO with ResNet50-FPN backbones.

The prior self-supervised learning researches mainly select image-level instance discrimination as pretext task. It achieves a fantastic classification performance that is comparable to supervised learning methods. However, with degraded transfer performance on downstream tasks such as object detection. To bridge the performance gap, we propose a novel object-level self-supervised learning method, called Contrastive learning with Downstream background invariance (CoDo). The pretext task is converted to focus on instance location modeling for various backgrounds, especially for downstream datasets. The ability of background invariance is considered vital for object detection. Firstly, a data augmentation strategy is proposed to paste the instances onto background images, and then jitter the bounding box to involve background information. Secondly, we implement architecture alignment between our pretraining network and the mainstream detection pipelines. Thirdly, hierarchical and multi views contrastive learning is designed to improve performance of visual representation learning. Experiments on MSCOCO demonstrate that the proposed CoDo with common backbones, ResNet50-FPN, yields strong transfer learning results for object detection.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes