Yes-Net: An effective Detector Based on Global Information
This is an incremental improvement for real-time object detection systems, offering efficiency gains over methods like YOLOv2 and SSD.
The paper tackles real-time object detection by introducing Yes-Net, which combines CNN and RNN for global information and uses RNN filtering instead of NMS, achieving 79.2% mAP on VOC2007 at 39 FPS.
This paper introduces a new real-time object detection approach named Yes-Net. It realizes the prediction of bounding boxes and class via single neural network like YOLOv2 and SSD, but owns more efficient and outstanding features. It combines local information with global information by adding the RNN architecture as a packed unit in CNN model to form the basic feature extractor. Independent anchor boxes coming from full-dimension k-means is also applied in Yes-Net, it brings better average IOU than grid anchor box. In addition, instead of NMS, Yes-Net uses RNN as a filter to get the final boxes, which is more efficient. For 416 x 416 input, Yes-Net achieves 79.2% mAP on VOC2007 test at 39 FPS on an Nvidia Titan X Pascal.