An Elementary Proof of the Near Optimality of LogSumExp Smoothing
This work addresses a foundational mathematical problem in optimization and machine learning, offering incremental improvements in understanding smoothing techniques.
The paper tackles the problem of smoothing the max function in the infinity norm, showing that the classical LogSumExp function differs from the max by at most ln(d) and proving a lower bound of ~0.8145 ln(d) for overestimating smoothings, establishing LogSumExp as near-optimal up to constant factors, while also providing exactly optimal smoothings for small dimensions.
We consider the design of smoothings of the (coordinate-wise) max function in $\mathbb{R}^d$ in the infinity norm. The LogSumExp function $f(x)=\ln(\sum^d_i\exp(x_i))$ provides a classical smoothing, differing from the max function in value by at most $\ln(d)$. We provide an elementary construction of a lower bound, establishing that every overestimating smoothing of the max function must differ by at least $\sim 0.8145\ln(d)$. Hence, LogSumExp is optimal up to small constant factors. However, in small dimensions, we provide stronger, exactly optimal smoothings attaining our lower bound, showing that the entropy-based LogSumExp approach to smoothing is not exactly optimal.