A convnet for non-maximum suppression
This addresses a critical but under-researched bottleneck in object detection for computer vision applications, though it is incremental as it builds on existing NMS frameworks.
The paper tackles the problem of non-maximum suppression (NMS) in object detection pipelines, which typically uses a greedy method with fixed thresholds, by proposing a convolutional network to perform NMS, resulting in improved recall and precision on crowded pedestrian detection scenes.
Non-maximum suppression (NMS) is used in virtually all state-of-the-art object detection pipelines. While essential object detection ingredients such as features, classifiers, and proposal methods have been extensively researched surprisingly little work has aimed to systematically address NMS. The de-facto standard for NMS is based on greedy clustering with a fixed distance threshold, which forces to trade-off recall versus precision. We propose a convnet designed to perform NMS of a given set of detections. We report experiments on a synthetic setup, and results on crowded pedestrian detection scenes. Our approach overcomes the intrinsic limitations of greedy NMS, obtaining better recall and precision.