Consistency and Finite Sample Behavior of Binary Class Probability Estimation
This work offers incremental theoretical extensions for binary class probability estimation, relevant for researchers in statistical learning and machine learning theory.
The paper investigates recovering class probabilities through empirical risk minimization (ERM), deriving an estimator and conditions for its convergence to true probabilities in L1-norm and probability. It provides finite sample convergence rates for various surrogate loss functions and analyzes their suitability, including model-misspecification and asymmetric loss extensions.
In this work we investigate to which extent one can recover class probabilities within the empirical risk minimization (ERM) paradigm. The main aim of our paper is to extend existing results and emphasize the tight relations between empirical risk minimization and class probability estimation. Based on existing literature on excess risk bounds and proper scoring rules, we derive a class probability estimator based on empirical risk minimization. We then derive fairly general conditions under which this estimator will converge, in the L1-norm and in probability, to the true class probabilities. Our main contribution is to present a way to derive finite sample L1-convergence rates of this estimator for different surrogate loss functions. We also study in detail which commonly used loss functions are suitable for this estimation problem and finally discuss the setting of model-misspecification as well as a possible extension to asymmetric loss functions.