DeepTopPush: Simple and Scalable Method for Accuracy at the Top
This addresses the need for efficient classification in applications like information retrieval and drug testing, where manual postprocessing is expensive, though it is an incremental improvement on existing methods for known bottlenecks.
The paper tackles the problem of accuracy at the top in binary classification, where performance is evaluated only on a small number of relevant samples, by proposing DeepTopPush, a method that modifies stochastic gradient descent to handle non-decomposable loss functions. It demonstrates strong results, including detecting 46% malware at a false alarm rate of 10^{-5} in a real-world application.
Accuracy at the top is a special class of binary classification problems where the performance is evaluated only on a small number of relevant (top) samples. Applications include information retrieval systems or processes with manual (expensive) postprocessing. This leads to minimizing the number of irrelevant samples above a threshold. We consider classifiers in the form of an arbitrary (deep) network and propose a new method DeepTopPush for minimizing the loss function at the top. Since the threshold depends on all samples, the problem is non-decomposable. We modify the stochastic gradient descent to handle the non-decomposability in an end-to-end training manner and propose a way to estimate the threshold only from values on the current minibatch and one delayed value. We demonstrate the excellent performance of DeepTopPush on visual recognition datasets and two real-world applications. The first one selects a small number of molecules for further drug testing. The second one uses real malware data, where we detected 46\% malware at an extremely low false alarm rate of $10^{-5}$.