LGMEJun 23, 2022

Optimizing Two-way Partial AUC with an End-to-end Framework

arXiv:2206.11655v129 citationsh-index: 82
Originality Incremental advance
AI Analysis

This work addresses a new inconsistency in partial AUC metrics for machine learning practitioners, offering a solution to optimize TPAUC, though it appears incremental as it builds on existing partial AUC concepts.

The paper tackles the problem of optimizing the Two-way Partial AUC (TPAUC), a metric that evaluates classifier performance in a specific region of high true positive rates and low false positive rates, by proposing an end-to-end framework that achieves efficient training and shows good generalization with empirical validation on benchmark datasets.

The Area Under the ROC Curve (AUC) is a crucial metric for machine learning, which evaluates the average performance over all possible True Positive Rates (TPRs) and False Positive Rates (FPRs). Based on the knowledge that a skillful classifier should simultaneously embrace a high TPR and a low FPR, we turn to study a more general variant called Two-way Partial AUC (TPAUC), where only the region with $\mathsf{TPR} \ge α, \mathsf{FPR} \le β$ is included in the area. Moreover, recent work shows that the TPAUC is essentially inconsistent with the existing Partial AUC metrics where only the FPR range is restricted, opening a new problem to seek solutions to leverage high TPAUC. Motivated by this, we present the first trial in this paper to optimize this new metric. The critical challenge along this course lies in the difficulty of performing gradient-based optimization with end-to-end stochastic training, even with a proper choice of surrogate loss. To address this issue, we propose a generic framework to construct surrogate optimization problems, which supports efficient end-to-end training with deep learning. Moreover, our theoretical analyses show that: 1) the objective function of the surrogate problems will achieve an upper bound of the original problem under mild conditions, and 2) optimizing the surrogate problems leads to good generalization performance in terms of TPAUC with a high probability. Finally, empirical studies over several benchmark datasets speak to the efficacy of our framework.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes