Thompson Sampling under Bernoulli Rewards with Local Differential Privacy
This work addresses privacy-preserving decision-making in bandit problems, which is incremental as it applies existing mechanisms to Thompson Sampling with theoretical analysis.
The paper tackled the problem of minimizing regret in multi-armed bandit settings while ensuring local differential privacy, deriving stochastic regret bounds for Thompson Sampling under three privatizing mechanisms (linear, quadratic, exponential) with fixed privacy budgets and simulating their convergence.
This paper investigates the problem of regret minimization for multi-armed bandit (MAB) problems with local differential privacy (LDP) guarantee. Given a fixed privacy budget $ε$, we consider three privatizing mechanisms under Bernoulli scenario: linear, quadratic and exponential mechanisms. Under each mechanism, we derive stochastic regret bound for Thompson Sampling algorithm. Finally, we simulate to illustrate the convergence of different mechanisms under different privacy budgets.