Consistent optimization of AMS by logistic loss minimization
This provides a theoretical foundation for a practical method in high-energy physics data analysis, but it is incremental as it formalizes an existing approach without introducing new techniques.
The paper tackles the problem of optimizing approximate median significance (AMS) by theoretically justifying a two-stage procedure used in the Higgs Boson Machine Learning Challenge, showing that minimizing logistic loss leads to consistent AMS optimization with a regret bound linking logistic loss regret to squared AMS regret.
In this paper, we theoretically justify an approach popular among participants of the Higgs Boson Machine Learning Challenge to optimize approximate median significance (AMS). The approach is based on the following two-stage procedure. First, a real-valued function is learned by minimizing a surrogate loss for binary classification, such as logistic loss, on the training sample. Then, a threshold is tuned on a separate validation sample, by direct optimization of AMS. We show that the regret of the resulting (thresholded) classifier measured with respect to the squared AMS, is upperbounded by the regret of the underlying real-valued function measured with respect to the logistic loss. Hence, we prove that minimizing logistic surrogate is a consistent method of optimizing AMS.