ML LG ST MENov 26, 2025

Geometric Calibration and Neutral Zones for Uncertainty-Aware Multi-Class Classification

Soumojit Das, Nairanjana Dasgupta, Prashanta Dutta

arXiv:2511.20960v24.5

Originality Highly original

AI Analysis

This work addresses the need for reliable uncertainty-aware classification in critical AI systems, offering a novel method with theoretical guarantees for applications requiring rigorous validation.

The paper tackles the problem of uncertainty quantification in multi-class classification by developing a geometric framework that calibrates probability outputs and provides instance-level reliability scores, resulting in a reduction of automated decision error rates from 16.8% to 6.9% while deferring 34.5% of samples.

Modern artificial intelligence systems make critical decisions yet often fail silently when uncertain -- even well-calibrated models provide no mechanism to identify \textit{which specific predictions} are unreliable. We develop a geometric framework addressing both calibration and instance-level uncertainty quantification for neural network probability outputs. Treating probability vectors as points on the $(c-1)$-dimensional probability simplex equipped with the Fisher--Rao metric, we construct: (i) Additive Log-Ratio (ALR) calibration maps that reduce exactly to Platt scaling for binary problems while extending naturally to multi-class settings, and (ii) geometric reliability scores that translate calibrated probabilities into actionable uncertainty measures, enabling principled deferral of ambiguous predictions to human review. Theoretical contributions include: consistency of the calibration estimator at rate $O_p(n^{-1/2})$ via M-estimation theory (Theorem~1), and tight concentration bounds for reliability scores with explicit sub-Gaussian parameters enabling sample size calculations for validation set design (Theorem~2). We conjecture Neyman--Pearson optimality of our neutral zone construction based on connections to Bhattacharyya coefficients. Empirical validation on Adeno-Associated Virus classification demonstrates that the two-stage framework captures 72.5\% of errors while deferring 34.5\% of samples, reducing automated decision error rates from 16.8\% to 6.9\%. Notably, calibration alone yields marginal accuracy gains; the operational benefit arises primarily from the reliability scoring mechanism, which applies to any well-calibrated probability output. This work bridges information geometry and statistical learning, offering formal guarantees for uncertainty-aware classification in applications requiring rigorous validation.

View on arXiv PDF

Similar