Online Learning via the Differential Privacy Lens
This work provides a theoretical framework for improving regret analysis in online learning, which is incremental as it builds on existing differential privacy tools to address known bottlenecks in algorithm analysis.
The paper tackled the problem of analyzing online learning algorithms by introducing a stability notion derived from differential privacy, enabling refined regret bounds for problems like online convex optimization and achieving first-order regret bounds for follow-the-perturbed-leader algorithms.
In this paper, we use differential privacy as a lens to examine online learning in both full and partial information settings. The differential privacy framework is, at heart, less about privacy and more about algorithmic stability, and thus has found application in domains well beyond those where information security is central. Here we develop an algorithmic property called one-step differential stability which facilitates a more refined regret analysis for online learning methods. We show that tools from the differential privacy literature can yield regret bounds for many interesting online learning problems including online convex optimization and online linear optimization. Our stability notion is particularly well-suited for deriving first-order regret bounds for follow-the-perturbed-leader algorithms, something that all previous analyses have struggled to achieve. We also generalize the standard max-divergence to obtain a broader class called Tsallis max-divergences. These define stronger notions of stability that are useful in deriving bounds in partial information settings such as multi-armed bandits and bandits with experts.