PBRnet: Pyramidal Bounding Box Refinement to Improve Object Localization Accuracy
This work addresses the need for more precise object localization in computer vision, offering incremental improvements to existing detectors like FPN and Cascade R-CNN.
The paper tackles the problem of improving object localization accuracy in object detection by proposing PBRnet, a pyramidal bounding box refinement architecture that combines coarse-to-fine frameworks with feature pyramid structures, resulting in performance gains of up to 5 mAP points on the MS-COCO dataset.
Many recently developed object detectors focused on coarse-to-fine framework which contains several stages that classify and regress proposals from coarse-grain to fine-grain, and obtains more accurate detection gradually. Multi-resolution models such as Feature Pyramid Network(FPN) integrate information of different levels of resolution and effectively improve the performance. Previous researches also have revealed that localization can be further improved by: 1) using fine-grained information which is more translational variant; 2) refining local areas which is more focused on local boundary information. Based on these principles, we designed a novel boundary refinement architecture to improve localization accuracy by combining coarse-to-fine framework with feature pyramid structure, named as Pyramidal Bounding Box Refinement network(PBRnet), which parameterizes gradually focused boundary areas of objects and leverages lower-level feature maps to extract finer local information when refining the predicted bounding boxes. Extensive experiments are performed on the MS-COCO dataset. The PBRnet brings a significant performance gains by roughly 3 point of $mAP$ when added to FPN or Libra R-CNN. Moreover, by treating Cascade R-CNN as a coarse-to-fine detector and replacing its localization branch by the regressor of PBRnet, it leads an extra performance improvement by 1.5 $mAP$, yielding a total performance boosting by as high as 5 point of $mAP$.