Differentially Private Multi-Armed Bandits in the Shuffle Model
This work addresses privacy-preserving decision-making in sequential settings like online advertising or healthcare, offering a novel algorithm that bridges the gap between privacy and performance, though it is incremental in improving upon existing shuffle model methods.
The paper tackles the problem of achieving differential privacy in multi-armed bandits within the shuffle model, resulting in a distribution-dependent regret of O((∑_{a:Δ_a>0} log T/Δ_a) + (k√(log(1/δ)) log T)/ε) and a distribution-independent regret of O(√(kT log T) + (k√(log(1/δ)) log T)/ε), nearly matching centralized model algorithms and outperforming local model ones.
We give an $(\varepsilon,δ)$-differentially private algorithm for the multi-armed bandit (MAB) problem in the shuffle model with a distribution-dependent regret of $O\left(\left(\sum_{a\in [k]:Δ_a>0}\frac{\log T}{Δ_a}\right)+\frac{k\sqrt{\log\frac{1}δ}\log T}{\varepsilon}\right)$, and a distribution-independent regret of $O\left(\sqrt{kT\log T}+\frac{k\sqrt{\log\frac{1}δ}\log T}{\varepsilon}\right)$, where $T$ is the number of rounds, $Δ_a$ is the suboptimality gap of the arm $a$, and $k$ is the total number of arms. Our upper bound almost matches the regret of the best known algorithms for the centralized model, and significantly outperforms the best known algorithm in the local model.