LG ST MLFeb 15, 2021

Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification

Yu Bai, Song Mei, Huan Wang, Caiming Xiong

arXiv:2102.07856v221.047 citations

Originality Incremental advance

AI Analysis

This work addresses calibration issues in machine learning models, providing theoretical insights that challenge common beliefs, though it is incremental in refining existing understanding.

The paper tackles the problem of miscalibration in binary classification, showing theoretically that over-parametrization is not the sole cause of over-confidence, and proves that logistic regression is inherently over-confident even in under-parametrized, realizable settings, with verification on simulations and real data.

Modern machine learning models with high accuracy are often miscalibrated -- the predicted top probability does not reflect the actual accuracy, and tends to be over-confident. It is commonly believed that such over-confidence is mainly due to over-parametrization, in particular when the model is large enough to memorize the training data and maximize the confidence. In this paper, we show theoretically that over-parametrization is not the only reason for over-confidence. We prove that logistic regression is inherently over-confident, in the realizable, under-parametrized setting where the data is generated from the logistic model, and the sample size is much larger than the number of parameters. Further, this over-confidence happens for general well-specified binary classification problems as long as the activation is symmetric and concave on the positive part. Perhaps surprisingly, we also show that over-confidence is not always the case -- there exists another activation function (and a suitable loss function) under which the learned classifier is under-confident at some probability values. Overall, our theory provides a precise characterization of calibration in realizable binary classification, which we verify on simulations and real data experiments.

View on arXiv PDF

Similar