Cascade Region Proposal and Global Context for Deep Object Detection
This work addresses object detection accuracy for computer vision applications, representing an incremental improvement over existing methods.
The paper tackles improving deep object detection by enhancing region proposal with a lightweight cascade structure and re-implementing global context modeling for object recognition, resulting in a 4.2% mAP gain on ILSVRC 2016 and achieving 87.9% mAP on PASCAL VOC 2012.
Deep region-based object detector consists of a region proposal step and a deep object recognition step. In this paper, we make significant improvements on both of the two steps. For region proposal we propose a novel lightweight cascade structure which can effectively improve RPN proposal quality. For object recognition we re-implement global context modeling with a few modications and obtain a performance boost (4.2% mAP gain on the ILSVRC 2016 validation set). Besides, we apply the idea of pre-training extensively and show its importance in both steps. Together with common training and testing tricks, we improve Faster R-CNN baseline by a large margin. In particular, we obtain 87.9% mAP on the PASCAL VOC 2012 test set, 65.3% on the ILSVRC 2016 test set and 36.8% on the COCO test-std set.