Label-Efficient Object Detection via Region Proposal Network Pre-Training
This work addresses the problem of reducing annotation costs for object detection, which is incremental by extending self-supervised pre-training to include the RPN module.
The paper tackles the problem of label-efficient object detection by proposing a pretext task to pre-train the region proposal network (RPN), reducing the need for training detection-specific modules from scratch. The result is consistent performance improvements in downstream tasks like object detection, instance segmentation, and few-shot detection, with the largest gains in label-scarce settings.
Self-supervised pre-training, based on the pretext task of instance discrimination, has fueled the recent advance in label-efficient object detection. However, existing studies focus on pre-training only a feature extractor network to learn transferable representations for downstream detection tasks. This leads to the necessity of training multiple detection-specific modules from scratch in the fine-tuning phase. We argue that the region proposal network (RPN), a common detection-specific module, can additionally be pre-trained towards reducing the localization error of multi-stage detectors. In this work, we propose a simple pretext task that provides an effective pre-training for the RPN, towards efficiently improving downstream object detection performance. We evaluate the efficacy of our approach on benchmark object detection tasks and additional downstream tasks, including instance segmentation and few-shot detection. In comparison with multi-stage detectors without RPN pre-training, our approach is able to consistently improve downstream task performance, with largest gains found in label-scarce settings.