CVApr 25, 2020

Cheaper Pre-training Lunch: An Efficient Paradigm for Object Detection

Dongzhan Zhou, Xinchi Zhou, Hongwen Zhang, Shuai Yi, Wanli Ouyang

arXiv:2004.12178v212.016 citations

Originality Incremental advance

AI Analysis

This addresses the problem of expensive pre-training for object detection researchers and practitioners, offering an incremental efficiency improvement.

The paper tackles the high computational cost of ImageNet pre-training for object detection by proposing Montage pre-training, which uses only the target dataset and requires 1/4 of the resources while achieving on-par or better performance on MS-COCO.

In this paper, we propose a general and efficient pre-training paradigm, Montage pre-training, for object detection. Montage pre-training needs only the target detection dataset while taking only 1/4 computational resources compared to the widely adopted ImageNet pre-training.To build such an efficient paradigm, we reduce the potential redundancy by carefully extracting useful samples from the original images, assembling samples in a Montage manner as input, and using an ERF-adaptive dense classification strategy for model pre-training. These designs include not only a new input pattern to improve the spatial utilization but also a novel learning objective to expand the effective receptive field of the pretrained model. The efficiency and effectiveness of Montage pre-training are validated by extensive experiments on the MS-COCO dataset, where the results indicate that the models using Montage pre-training are able to achieve on-par or even better detection performances compared with the ImageNet pre-training.

View on arXiv PDF

Similar