Regenerative Particle Thompson Sampling
This incremental improvement addresses the computational bottleneck of maintaining posterior distributions in bandit algorithms for applications like network optimization.
The paper tackles the practical difficulty of implementing Thompson sampling in stochastic bandit problems by proposing regenerative particle Thompson sampling (RPTS), which improves upon particle Thompson sampling (PTS) by deleting unfit particles and regenerating new ones near fit particles, showing uniform empirical improvement across representative problems including 5G network slicing.
This paper proposes regenerative particle Thompson sampling (RPTS), a flexible variation of Thompson sampling. Thompson sampling itself is a Bayesian heuristic for solving stochastic bandit problems, but it is hard to implement in practice due to the intractability of maintaining a continuous posterior distribution. Particle Thompson sampling (PTS) is an approximation of Thompson sampling obtained by simply replacing the continuous distribution by a discrete distribution supported at a set of weighted static particles. We observe that in PTS, the weights of all but a few fit particles converge to zero. RPTS is based on the heuristic: delete the decaying unfit particles and regenerate new particles in the vicinity of fit surviving particles. Empirical evidence shows uniform improvement from PTS to RPTS and flexibility and efficacy of RPTS across a set of representative bandit problems, including an application to 5G network slicing.