LGCRJul 3, 2023

Thompson Sampling under Bernoulli Rewards with Local Differential Privacy

arXiv:2307.00863v1h-index: 14
Originality Synthesis-oriented
AI Analysis

This work addresses privacy-preserving decision-making in bandit problems, which is incremental as it applies existing mechanisms to Thompson Sampling with theoretical analysis.

The paper tackled the problem of minimizing regret in multi-armed bandit settings while ensuring local differential privacy, deriving stochastic regret bounds for Thompson Sampling under three privatizing mechanisms (linear, quadratic, exponential) with fixed privacy budgets and simulating their convergence.

This paper investigates the problem of regret minimization for multi-armed bandit (MAB) problems with local differential privacy (LDP) guarantee. Given a fixed privacy budget $ε$, we consider three privatizing mechanisms under Bernoulli scenario: linear, quadratic and exponential mechanisms. Under each mechanism, we derive stochastic regret bound for Thompson Sampling algorithm. Finally, we simulate to illustrate the convergence of different mechanisms under different privacy budgets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes