DeepBox: Learning Objectness with Convolutional Networks
This work addresses object detection efficiency for computer vision applications, representing an incremental improvement over existing methods.
The paper tackled the problem of ranking object proposals by introducing DeepBox, a convolutional neural network that reranks proposals from bottom-up methods, achieving the same recall with 500 proposals as bottom-up methods with 2000 and leading to a 4.5-point gain in detection mAP.
Existing object proposal approaches use primarily bottom-up cues to rank proposals, while we believe that objectness is in fact a high level construct. We argue for a data-driven, semantic approach for ranking object proposals. Our framework, which we call DeepBox, uses convolutional neural networks (CNNs) to rerank proposals from a bottom-up method. We use a novel four-layer CNN architecture that is as good as much larger networks on the task of evaluating objectness while being much faster. We show that DeepBox significantly improves over the bottom-up ranking, achieving the same recall with 500 proposals as achieved by bottom-up methods with 2000. This improvement generalizes to categories the CNN has never seen before and leads to a 4.5-point gain in detection mAP. Our implementation achieves this performance while running at 260 ms per image.