A study of Thompson Sampling with Parameter h
This is an incremental improvement for researchers in bandit algorithms, focusing on parameter sensitivity.
The authors tackled the problem of modifying the Thompson Sampling algorithm for stochastic multi-armed bandits by introducing a parameter h to adjust the importance of arm selection probabilities, and they showed that its optimality remains robust within a specific parameter range for two-arm bandits.
Thompson Sampling algorithm is a well known Bayesian algorithm for solving stochastic multi-armed bandit. At each time step the algorithm chooses each arm with probability proportional to it being the current best arm. We modify the strategy by introducing a paramter h which alters the importance of the probability of an arm being the current best arm. We show that the optimality of Thompson sampling is robust to this perturbation within a range of parameter values for two arm bandits.