LGMay 5, 2025

Connecting Thompson Sampling and UCB: Towards More Efficient Trade-offs Between Privacy and Regret

MILA
arXiv:2505.02383v2h-index: 6ICML
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficient privacy-regret trade-offs in bandit algorithms, which is incremental but offers a new parametrized approach.

The paper tackles the problem of balancing privacy and regret in differentially private stochastic bandit problems by proposing DP-TS-UCB, a novel algorithm that achieves a trade-off with a regret bound of O(K ln^{α+1}(T)/Δ) and satisfies GDP privacy guarantees.

We address differentially private stochastic bandit problems from the angles of exploring the deep connections among Thompson Sampling with Gaussian priors, Gaussian mechanisms, and Gaussian differential privacy (GDP). We propose DP-TS-UCB, a novel parametrized private bandit algorithm that enables to trade off privacy and regret. DP-TS-UCB satisfies $ \tilde{O} \left(T^{0.25(1-α)}\right)$-GDP and enjoys an $O \left(K\ln^{α+1}(T)/Δ\right)$ regret bound, where $α\in [0,1]$ controls the trade-off between privacy and regret. Theoretically, our DP-TS-UCB relies on anti-concentration bounds of Gaussian distributions and links exploration mechanisms in Thompson Sampling-based algorithms and Upper Confidence Bound-based algorithms, which may be of independent interest.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes