The fundamental nature of the log loss function
This work provides a theoretical foundation for loss function selection in probabilistic prediction, which is incremental but clarifies a key property for researchers in algorithmic randomness and machine learning.
The paper tackles the problem of comparing prediction algorithms using different proper loss functions, showing that the log loss function is the most selective because any algorithm optimal under it will also be optimal under any computable proper mixable loss function, while this does not hold for other standard loss functions like Brier or spherical loss.
The standard loss functions used in the literature on probabilistic prediction are the log loss function, the Brier loss function, and the spherical loss function; however, any computable proper loss function can be used for comparison of prediction algorithms. This note shows that the log loss function is most selective in that any prediction algorithm that is optimal for a given data sequence (in the sense of the algorithmic theory of randomness) under the log loss function will be optimal under any computable proper mixable loss function; on the other hand, there is a data sequence and a prediction algorithm that is optimal for that sequence under either of the two other standard loss functions but not under the log loss function.