CVMar 19, 2021

Efficient Visual Pretraining with Contrastive Detection

arXiv:2103.10957v2188 citations
AI Analysis

This addresses the computational bottleneck in self-supervised learning for computer vision, enabling more efficient pretraining with broad applications in transfer learning.

The paper tackles the high computational cost of self-supervised pretraining by introducing a new objective called contrastive detection, which achieves state-of-the-art transfer accuracy on various downstream tasks while requiring up to 10x less pretraining and matching the performance of a system using 1000x more data.

Self-supervised pretraining has been shown to yield powerful representations for transfer learning. These performance gains come at a large computational cost however, with state-of-the-art methods requiring an order of magnitude more computation than supervised pretraining. We tackle this computational bottleneck by introducing a new self-supervised objective, contrastive detection, which tasks representations with identifying object-level features across augmentations. This objective extracts a rich learning signal per image, leading to state-of-the-art transfer accuracy on a variety of downstream tasks, while requiring up to 10x less pretraining. In particular, our strongest ImageNet-pretrained model performs on par with SEER, one of the largest self-supervised systems to date, which uses 1000x more pretraining data. Finally, our objective seamlessly handles pretraining on more complex images such as those in COCO, closing the gap with supervised transfer learning from COCO to PASCAL.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes