CVJul 12, 2016

End-to-end training of object class detectors for mean average precision

arXiv:1607.03476v2332 citations
AI Analysis

This addresses the disconnect between training and evaluation metrics in object detection for computer vision researchers, though it is incremental as it matches rather than surpasses existing performance.

The authors tackled the problem of training object detectors using mean average precision (mAP) as the loss function, enabling end-to-end training that includes non-maximum suppression (NMS). They achieved equivalent performance to standard Fast R-CNN on PASCAL VOC 2007 and 2012 datasets.

We present a method for training CNN-based object class detectors directly using mean average precision (mAP) as the training loss, in a truly end-to-end fashion that includes non-maximum suppression (NMS) at training time. This contrasts with the traditional approach of training a CNN for a window classification loss, then applying NMS only at test time, when mAP is used as the evaluation metric in place of classification accuracy. However, mAP following NMS forms a piecewise-constant structured loss over thousands of windows, with gradients that do not convey useful information for gradient descent. Hence, we define new, general gradient-like quantities for piecewise constant functions, which have wide applicability. We describe how to calculate these efficiently for mAP following NMS, enabling to train a detector based on Fast R-CNN directly for mAP. This model achieves equivalent performance to the standard Fast R-CNN on the PASCAL VOC 2007 and 2012 datasets, while being conceptually more appealing as the very same model and loss are used at both training and test time.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes