MLLGPRFeb 8, 2022

Understanding the bias-variance tradeoff of Bregman divergences

arXiv:2202.04167v212 citations
AI Analysis

This work provides incremental theoretical insights into machine learning optimization for researchers focusing on loss functions and generalization.

The paper extends the bias-variance tradeoff to Bregman divergences by interpreting the central prediction as a mean in a dual space, leading to results such as a generalized law of total variance and ensembling operations that reduce variance without affecting bias.

This paper builds upon the work of Pfau (2013), which generalized the bias variance tradeoff to any Bregman divergence loss function. Pfau (2013) showed that for Bregman divergences, the bias and variances are defined with respect to a central label, defined as the mean of the label variable, and a central prediction, of a more complex form. We show that, similarly to the label, the central prediction can be interpreted as the mean of a random variable, where the mean operates in a dual space defined by the loss function itself. Viewing the bias-variance tradeoff through operations taken in dual space, we subsequently derive several results of interest. In particular, (a) the variance terms satisfy a generalized law of total variance; (b) if a source of randomness cannot be controlled, its contribution to the bias and variance has a closed form; (c) there exist natural ensembling operations in the label and prediction spaces which reduce the variance and do not affect the bias.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes