Resampled Confidence Regions with Exponential Shrinkage for the Regression Function of Binary Classification
This work addresses the need for reliable uncertainty quantification in binary classification, offering a method that is applicable to arbitrary model classes, though it is incremental in extending existing resampling techniques.
The paper tackles the problem of constructing distribution-free confidence regions for the regression function in binary classification, providing exponential bounds on their sizes and demonstrating applicability to various models like logistic regression.
The regression function is one of the key objects of binary classification, since it not only determines a Bayes optimal classifier, hence, defines an optimal decision boundary, but also encodes the conditional distribution of the output given the input. In this paper we build distribution-free confidence regions for the regression function for any user-chosen confidence level and any finite sample size based on a resampling test. These regions are abstract, as the model class can be almost arbitrary, e.g., it does not have to be finitely parameterized. We prove the strong uniform consistency of a new empirical risk minimization based approach for model classes with finite pseudo-dimensions and inverse Lipschitz parameterizations. We provide exponential probably approximately correct bounds on the $L_2$ sizes of these regions, and demonstrate the ideas on specific models. Additionally, we also consider a k-nearest neighbors based method, for which we prove strong pointwise bounds on the probability of exclusion. Finally, the constructions are illustrated on a logistic model class and compared to the asymptotic ellipsoids of the maximum likelihood estimator.