Supervised Learning with General Risk Functionals
This work addresses the need for risk-sensitive learning in machine learning by offering foundational guarantees that apply to various risk measures, though it is incremental in extending prior specialized results to a more general framework.
The paper tackles the problem of providing uniform convergence guarantees for a broad class of Hölder risk functionals beyond the expected loss, establishing the first such results for estimating the loss distribution's CDF and enabling empirical risk minimization with practical gradient-based methods for distortion risks.
Standard uniform convergence results bound the generalization gap of the expected loss over a hypothesis class. The emergence of risk-sensitive learning requires generalization guarantees for functionals of the loss distribution beyond the expectation. While prior works specialize in uniform convergence of particular functionals, our work provides uniform convergence for a general class of Hölder risk functionals for which the closeness in the Cumulative Distribution Function (CDF) entails closeness in risk. We establish the first uniform convergence results for estimating the CDF of the loss distribution, yielding guarantees that hold simultaneously both over all Hölder risk functionals and over all hypotheses. Thus licensed to perform empirical risk minimization, we develop practical gradient-based methods for minimizing distortion risks (widely studied subset of Hölder risks that subsumes the spectral risks, including the mean, conditional value at risk, cumulative prospect theory risks, and others) and provide convergence guarantees. In experiments, we demonstrate the efficacy of our learning procedure, both in settings where uniform convergence results hold and in high-dimensional settings with deep networks.