Better Bounds for the Distributed Experts Problem
This work addresses communication efficiency in distributed learning for experts across servers, offering incremental improvements over existing methods.
The paper tackles the distributed experts problem by developing a protocol that achieves regret roughly R ≳ 1/(√T·poly log(nsT)) with communication complexity O((n/R² + s/R²)·max(s¹⁻²/ᵖ,1)·poly log(nsT)) bits, improving on prior bounds.
In this paper, we study the distributed experts problem, where $n$ experts are distributed across $s$ servers for $T$ timesteps. The loss of each expert at each time $t$ is the $\ell_p$ norm of the vector that consists of the losses of the expert at each of the $s$ servers at time $t$. The goal is to minimize the regret $R$, i.e., the loss of the distributed protocol compared to the loss of the best expert, amortized over the all $T$ times, while using the minimum amount of communication. We give a protocol that achieves regret roughly $R\gtrsim\frac{1}{\sqrt{T}\cdot\text{poly}\log(nsT)}$, using $\mathcal{O}\left(\frac{n}{R^2}+\frac{s}{R^2}\right)\cdot\max(s^{1-2/p},1)\cdot\text{poly}\log(nsT)$ bits of communication, which improves on previous work.