ML LGJan 18, 2023

An Analysis of Loss Functions for Binary Classification and Regression

arXiv:2301.07638v21 citationsh-index: 23

Originality Synthesis-oriented

AI Analysis

This work provides theoretical insights into loss function consistency and outlier sensitivity, which is useful for researchers in machine learning and statistics, though it is incremental as it builds on existing frameworks.

This paper analyzes margin-based loss functions for binary classification and regression, showing that many such functions estimate scores equivalent to weighted log-likelihood scores and providing a characterization for consistent loss functions. It uses this to construct a new Huber-type loss function and demonstrates that minimizing exponential loss is equivalent to minimizing squared standardized logistic regression residuals, offering new interpretations for AdaBoost.

This paper explores connections between margin-based loss functions and consistency in binary classification and regression applications. It is shown that a large class of margin-based loss functions for binary classification/regression result in estimating scores equivalent to log-likelihood scores weighted by an even function. A simple characterization for conformable (consistent) loss functions is given, which allows for straightforward comparison of different losses, including exponential loss, logistic loss, and others. The characterization is used to construct a new Huber-type loss function for the logistic model. A simple relation between the margin and standardized logistic regression residuals is derived, demonstrating that all margin-based loss can be viewed as loss functions of squared standardized logistic regression residuals. The relation provides new, straightforward interpretations for exponential and logistic loss, and aids in understanding why exponential loss is sensitive to outliers. In particular, it is shown that minimizing empirical exponential loss is equivalent to minimizing the sum of squared standardized logistic regression residuals. The relation also provides new insight into the AdaBoost algorithm.

View on arXiv PDF

Similar