Multinomial Logit Bandit with Low Switching Cost
This work addresses the challenge of reducing adaptivity overhead in sequential decision-making for applications like recommendation systems, though it is incremental as it builds on existing bandit frameworks.
The paper tackles the problem of minimizing switching costs in multinomial logit bandits while achieving near-optimal regret, proposing algorithms like AT-DUCB with O(N log T) assortment switches, nearly matching a lower bound of Ω(N log T / log log T).
We study multinomial logit bandit with limited adaptivity, where the algorithms change their exploration actions as infrequently as possible when achieving almost optimal minimax regret. We propose two measures of adaptivity: the assortment switching cost and the more fine-grained item switching cost. We present an anytime algorithm (AT-DUCB) with $O(N \log T)$ assortment switches, almost matching the lower bound $Ω(\frac{N \log T}{ \log \log T})$. In the fixed-horizon setting, our algorithm FH-DUCB incurs $O(N \log \log T)$ assortment switches, matching the asymptotic lower bound. We also present the ESUCB algorithm with item switching cost $O(N \log^2 T)$.