LGITOct 5, 2017

A study of Thompson Sampling with Parameter h

arXiv:1710.02174v1
Originality Synthesis-oriented
AI Analysis

This is an incremental improvement for researchers in bandit algorithms, focusing on parameter sensitivity.

The authors tackled the problem of modifying the Thompson Sampling algorithm for stochastic multi-armed bandits by introducing a parameter h to adjust the importance of arm selection probabilities, and they showed that its optimality remains robust within a specific parameter range for two-arm bandits.

Thompson Sampling algorithm is a well known Bayesian algorithm for solving stochastic multi-armed bandit. At each time step the algorithm chooses each arm with probability proportional to it being the current best arm. We modify the strategy by introducing a paramter h which alters the importance of the probability of an arm being the current best arm. We show that the optimality of Thompson sampling is robust to this perturbation within a range of parameter values for two arm bandits.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes