LG MLOct 2, 2018

Thompson Sampling Algorithms for Cascading Bandits

Zixin Zhong, Wang Chi Cheung, Vincent Y. F. Tan

arXiv:1810.01187v49.419 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the problem of efficient optimization in online recommender systems for researchers and practitioners, offering incremental theoretical advancements.

The paper tackled the lack of theoretical guarantees for Thompson sampling algorithms in cascading bandits, providing tighter regret bounds and introducing new algorithms that achieve state-of-the-art performance, with numerical experiments showing superiority over existing methods.

Motivated by the pressing need for efficient optimization in online recommender systems, we revisit the cascading bandit model proposed by Kveton et al. (2015). While Thompson sampling (TS) algorithms have been shown to be empirically superior to Upper Confidence Bound (UCB) algorithms for cascading bandits, theoretical guarantees are only known for the latter. In this paper, we first provide a problem-dependent upper bound on the regret of a TS algorithm with Beta-Bernoulli updates; this upper bound is tighter than a recent derivation under a more general setting by Huyuk and Tekin (2019). Next, we design and analyze another TS algorithm with Gaussian updates, TS-Cascade. TS-Cascade achieves the state-of-the-art regret bound for cascading bandits. Complementarily, we consider a linear generalization of the cascading bandit model, which allows efficient learning in large cascading bandit problem instances. We introduce and analyze a TS algorithm, which enjoys a regret bound that depends on the dimension of the linear model but not the number of items. Finally, by using information-theoretic techniques and judiciously constructing cascading bandit instances, we derive a nearly matching regret lower bound for the standard model. Our paper establishes the first theoretical guarantees on TS algorithms for stochastic combinatorial bandit problem model with partial feedback. Numerical experiments demonstrate the superiority of the proposed TS algorithms compared to existing UCB-based ones.

View on arXiv PDF Code

Similar