STITLGNov 12, 2022

Empirical Risk Minimization with Relative Entropy Regularization

arXiv:2211.06617v539 citationsh-index: 25
Originality Incremental advance
AI Analysis

This work provides theoretical insights into regularization for machine learning, but it is incremental as it extends existing ERM-RER frameworks with a more flexible reference measure assumption.

The paper investigates empirical risk minimization with relative entropy regularization (ERM-RER) under a generalized assumption that the reference measure is σ-finite, not necessarily a probability measure, and shows that the solution is a unique probability measure with a probably-approximately-correct guarantee and sub-Gaussian empirical risk under specific conditions.

The empirical risk minimization (ERM) problem with relative entropy regularization (ERM-RER) is investigated under the assumption that the reference measure is a $σ$-finite measure, and not necessarily a probability measure. Under this assumption, which leads to a generalization of the ERM-RER problem allowing a larger degree of flexibility for incorporating prior knowledge, numerous relevant properties are stated. Among these properties, the solution to this problem, if it exists, is shown to be a unique probability measure, mutually absolutely continuous with the reference measure. Such a solution exhibits a probably-approximately-correct guarantee for the ERM problem independently of whether the latter possesses a solution. For a fixed dataset and under a specific condition, the empirical risk is shown to be a sub-Gaussian random variable when the models are sampled from the solution to the ERM-RER problem. The generalization capabilities of the solution to the ERM-RER problem (the Gibbs algorithm) are studied via the sensitivity of the expected empirical risk to deviations from such a solution towards alternative probability measures. Finally, an interesting connection between sensitivity, generalization error, and lautum information is established.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes