LGMLMay 26, 2019

Field-aware Calibration: A Simple and Empirically Strong Method for Reliable Probabilistic Predictions

arXiv:1905.10713v330 citations
Originality Incremental advance
AI Analysis

This addresses reliability issues in practical systems like online advertising, but it is an incremental improvement over existing calibration methods.

The paper tackles the problem of miscalibration in machine learning predictions, where model probabilities do not match actual outcomes, and proposes Neural Calibration, a post-hoc method that improves calibration and other metrics like AUC across five large-scale datasets.

It is often observed that the probabilistic predictions given by a machine learning model can disagree with averaged actual outcomes on specific subsets of data, which is also known as the issue of miscalibration. It is responsible for the unreliability of practical machine learning systems. For example, in online advertising, an ad can receive a click-through rate prediction of 0.1 over some population of users where its actual click rate is 0.15. In such cases, the probabilistic predictions have to be fixed before the system can be deployed. In this paper, we first introduce a new evaluation metric named field-level calibration error that measures the bias in predictions over the sensitive input field that the decision-maker concerns. We show that existing post-hoc calibration methods have limited improvements in the new field-level metric and other non-calibration metrics such as the AUC score. To this end, we propose Neural Calibration, a simple yet powerful post-hoc calibration method that learns to calibrate by making full use of the field-aware information over the validation set. We present extensive experiments on five large-scale datasets. The results showed that Neural Calibration significantly improves against uncalibrated predictions in common metrics such as the negative log-likelihood, Brier score and AUC, as well as the proposed field-level calibration error.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes