LG ITOct 5, 2017

A study of Thompson Sampling with Parameter h

arXiv:1710.02174v1

Originality Synthesis-oriented

AI Analysis

This is an incremental improvement for researchers in bandit algorithms, focusing on parameter sensitivity.

The authors tackled the problem of modifying the Thompson Sampling algorithm for stochastic multi-armed bandits by introducing a parameter h to adjust the importance of arm selection probabilities, and they showed that its optimality remains robust within a specific parameter range for two-arm bandits.

Thompson Sampling algorithm is a well known Bayesian algorithm for solving stochastic multi-armed bandit. At each time step the algorithm chooses each arm with probability proportional to it being the current best arm. We modify the strategy by introducing a paramter h which alters the importance of the probability of an arm being the current best arm. We show that the optimality of Thompson sampling is robust to this perturbation within a range of parameter values for two arm bandits.

View on arXiv PDF

Similar