Exact characterization of ε-Safe Decision Regions for exponential family distributions and Multi Cost SVM approximation
This work addresses the need for trustworthy and reliable machine learning models by offering formal guarantees on prediction safety, which is crucial for applications requiring high confidence, though it is incremental in extending existing SVM methods.
The paper tackles the problem of providing probabilistic guarantees for classifier predictions by introducing ε-Safe Decision Regions, proving that these regions are analytically determined for exponential family distributions, and developing a Multi Cost SVM algorithm to approximate them for non-exponential data, with experiments and code provided for validation.
Probabilistic guarantees on the prediction of data-driven classifiers are necessary to define models that can be considered reliable. This is a key requirement for modern machine learning in which the goodness of a system is measured in terms of trustworthiness, clearly dividing what is safe from what is unsafe. The spirit of this paper is exactly in this direction. First, we introduce a formal definition of ε-Safe Decision Region, a subset of the input space in which the prediction of a target (safe) class is probabilistically guaranteed. Second, we prove that, when data come from exponential family distributions, the form of such a region is analytically determined and controllable by design parameters, i.e. the probability of sampling the target class and the confidence on the prediction. However, the request of having exponential data is not always possible. Inspired by this limitation, we developed Multi Cost SVM, an SVM based algorithm that approximates the safe region and is also able to handle unbalanced data. The research is complemented by experiments and code available for reproducibility.