STLGMEMLFeb 21, 2025

Optimal and Provable Calibration in High-Dimensional Binary Classification: Angular Calibration and Platt Scaling

arXiv:2502.15131v32 citationsh-index: 4
Originality Highly original
AI Analysis

This provides a provably optimal calibration strategy for high-dimensional binary classification, addressing a fundamental issue in machine learning reliability, though it is incremental in extending calibration theory to high dimensions.

The paper tackles the problem of calibrating linear binary classifiers in high-dimensional settings by introducing an angular calibration method that interpolates with a chance classifier, proving it is well-calibrated and Bregman-optimal as sample and feature sizes diverge comparably, with the angle between estimated and true weights consistently estimable. It also shows that Platt-scaling converges to this optimal solution under certain conditions, inheriting these properties provably.

We study the fundamental problem of calibrating a linear binary classifier of the form $σ(\hat{w}^\top x)$, where the feature vector $x$ is Gaussian, $σ$ is a link function, and $\hat{w}$ is an estimator of the true linear weight $w^\star$. By interpolating with a noninformative $\textit{chance classifier}$, we construct a well-calibrated predictor whose interpolation weight depends on the angle $\angle(\hat{w}, w_\star)$ between the estimator $\hat{w}$ and the true linear weight $w_\star$. We establish that this angular calibration approach is provably well-calibrated in a high-dimensional regime where the number of samples and features both diverge, at a comparable rate. The angle $\angle(\hat{w}, w_\star)$ can be consistently estimated. Furthermore, the resulting predictor is uniquely $\textit{Bregman-optimal}$, minimizing the Bregman divergence to the true label distribution within a suitable class of calibrated predictors. Our work is the first to provide a calibration strategy that satisfies both calibration and optimality properties provably in high dimensions. Additionally, we identify conditions under which a classical Platt-scaling predictor converges to our Bregman-optimal calibrated solution. Thus, Platt-scaling also inherits these desirable properties provably in high dimensions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes