ML LG PRFeb 8, 2022

Understanding the bias-variance tradeoff of Bregman divergences

Ben Adlam, Neha Gupta, Zelda Mariet, Jamie Smith

arXiv:2202.04167v26.712 citations

Originality Synthesis-oriented

AI Analysis

This work provides incremental theoretical insights into machine learning optimization for researchers focusing on loss functions and generalization.

The paper extends the bias-variance tradeoff to Bregman divergences by interpreting the central prediction as a mean in a dual space, leading to results such as a generalized law of total variance and ensembling operations that reduce variance without affecting bias.

This paper builds upon the work of Pfau (2013), which generalized the bias variance tradeoff to any Bregman divergence loss function. Pfau (2013) showed that for Bregman divergences, the bias and variances are defined with respect to a central label, defined as the mean of the label variable, and a central prediction, of a more complex form. We show that, similarly to the label, the central prediction can be interpreted as the mean of a random variable, where the mean operates in a dual space defined by the loss function itself. Viewing the bias-variance tradeoff through operations taken in dual space, we subsequently derive several results of interest. In particular, (a) the variance terms satisfy a generalized law of total variance; (b) if a source of randomness cannot be controlled, its contribution to the bias and variance has a closed form; (c) there exist natural ensembling operations in the label and prediction spaces which reduce the variance and do not affect the bias.

View on arXiv PDF

Similar