CVNov 27, 2020

Self-EMD: Self-Supervised Object Detection without ImageNet

arXiv:2011.13677v322.198 citations

Originality Highly original

AI Analysis

This work addresses the problem of pre-training object detectors without relying on large, curated iconic image datasets like ImageNet, which is significant for researchers and practitioners working with domain-specific or less curated image data.

This paper introduces Self-EMD, a self-supervised method for object detection that trains directly on unlabeled non-iconic image datasets like COCO. The method achieves 39.8% mAP on COCO with a Faster R-CNN (ResNet50-FPN) baseline, matching state-of-the-art self-supervised methods pre-trained on ImageNet, and can reach 40.4% mAP with more unlabeled data.

In this paper, we propose a novel self-supervised representation learning method, Self-EMD, for object detection. Our method directly trained on unlabeled non-iconic image dataset like COCO, instead of commonly used iconic-object image dataset like ImageNet. We keep the convolutional feature maps as the image embedding to preserve spatial structures and adopt Earth Mover's Distance (EMD) to compute the similarity between two embeddings. Our Faster R-CNN (ResNet50-FPN) baseline achieves 39.8% mAP on COCO, which is on par with the state of the art self-supervised methods pre-trained on ImageNet. More importantly, it can be further improved to 40.4% mAP with more unlabeled images, showing its great potential for leveraging more easily obtained unlabeled data. Code will be made available.

View on arXiv PDF

Similar