Exp-Concavity of Proper Composite Losses
This work addresses the challenge of improving online learning algorithms for researchers and practitioners by bridging theoretical performance and practical efficiency, though it is incremental as it builds on existing notions of mixability and exp-concavity.
The paper tackles the problem of achieving optimal regret bounds in online prediction with expert advice by characterizing the exp-concavity of proper composite losses, enabling transformation of mixable losses into exp-concave ones with the same parameter β, which allows combining strong theoretical guarantees with computational efficiency.
The goal of online prediction with expert advice is to find a decision strategy which will perform almost as well as the best expert in a given pool of experts, on any sequence of outcomes. This problem has been widely studied and $O(\sqrt{T})$ and $O(\log{T})$ regret bounds can be achieved for convex losses (\cite{zinkevich2003online}) and strictly convex losses with bounded first and second derivatives (\cite{hazan2007logarithmic}) respectively. In special cases like the Aggregating Algorithm (\cite{vovk1995game}) with mixable losses and the Weighted Average Algorithm (\cite{kivinen1999averaging}) with exp-concave losses, it is possible to achieve $O(1)$ regret bounds. \cite{van2012exp} has argued that mixability and exp-concavity are roughly equivalent under certain conditions. Thus by understanding the underlying relationship between these two notions we can gain the best of both algorithms (strong theoretical performance guarantees of the Aggregating Algorithm and the computational efficiency of the Weighted Average Algorithm). In this paper we provide a complete characterization of the exp-concavity of any proper composite loss. Using this characterization and the mixability condition of proper losses (\cite{van2012mixability}), we show that it is possible to transform (re-parameterize) any $β$-mixable binary proper loss into a $β$-exp-concave composite loss with the same $β$. In the multi-class case, we propose an approximation approach for this transformation.