Optimality of the Subgradient Algorithm in the Stochastic Setting
This provides a universal algorithm for online learning on the simplex with broad applications, but it is incremental as it builds on existing Subgradient methods.
The paper demonstrates that the Subgradient algorithm achieves O(√N) regret for adversarial costs and O(1) pseudo-regret for i.i.d. costs in online learning on the simplex, showing it is universal and not a variant of Hedge.
We show that the Subgradient algorithm is universal for online learning on the simplex in the sense that it simultaneously achieves $O(\sqrt N)$ regret for adversarial costs and $O(1)$ pseudo-regret for i.i.d costs. To the best of our knowledge this is the first demonstration of a universal algorithm on the simplex that is not a variant of Hedge. Since Subgradient is a popular and widely used algorithm our results have immediate broad application.