Statistical Properties of the log-cosh Loss Function Used in Machine Learning
It provides foundational statistical insights for machine learning practitioners using the log-cosh loss, but is incremental as it analyzes an existing function without introducing new methods.
This paper tackles the lack of statistical analysis for the log-cosh loss function by deriving its distribution function and comparing it to the Cauchy distribution, examining properties like pdf, cdf, likelihood, and Fisher information, and applying it to quantile regression with comparisons to other robust estimators.
This paper analyzes a popular loss function used in machine learning called the log-cosh loss function. A number of papers have been published using this loss function but, to date, no statistical analysis has been presented in the literature. In this paper, we present the distribution function from which the log-cosh loss arises. We compare it to a similar distribution, called the Cauchy distribution, and carry out various statistical procedures that characterize its properties. In particular, we examine its associated pdf, cdf, likelihood function and Fisher information. Side-by-side we consider the Cauchy and Cosh distributions as well as the MLE of the location parameter with asymptotic bias, asymptotic variance, and confidence intervals. We also provide a comparison of robust estimators from several other loss functions, including the Huber loss function and the rank dispersion function. Further, we examine the use of the log-cosh function for quantile regression. In particular, we identify a quantile distribution function from which a maximum likelihood estimator for quantile regression can be derived. Finally, we compare a quantile M-estimator based on log-cosh with robust monotonicity against another approach to quantile regression based on convolutional smoothing.