LGNov 20, 2023

Leveraging Uncertainty Estimates To Improve Classifier Performance

arXiv:2311.11723v11 citationsh-index: 59
Originality Highly original
AI Analysis

This addresses performance degradation in binary classifiers due to sampling bias or distributional drift, offering a practical improvement for applications requiring high precision.

The paper tackles the problem of model score misalignment with true positivity rates in binary classification, showing that incorporating uncertainty estimates into decision boundary selection yields 25%-40% recall gains at high precision bounds compared to using model scores alone.

Binary classification involves predicting the label of an instance based on whether the model score for the positive class exceeds a threshold chosen based on the application requirements (e.g., maximizing recall for a precision bound). However, model scores are often not aligned with the true positivity rate. This is especially true when the training involves a differential sampling across classes or there is distributional drift between train and test settings. In this paper, we provide theoretical analysis and empirical evidence of the dependence of model score estimation bias on both uncertainty and score itself. Further, we formulate the decision boundary selection in terms of both model score and uncertainty, prove that it is NP-hard, and present algorithms based on dynamic programming and isotonic regression. Evaluation of the proposed algorithms on three real-world datasets yield 25%-40% gain in recall at high precision bounds over the traditional approach of using model score alone, highlighting the benefits of leveraging uncertainty.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes