A Note on a Tight Lower Bound for MNL-Bandit Assortment Selection Models
This provides a theoretical foundation for assortment selection algorithms, though it is incremental as it refines existing bounds.
The paper tackles the dynamic assortment planning problem under the capacitated MNL bandit model by proving a tight lower bound on accumulated regret that matches existing upper bounds for all parameters, closing an O(√K) gap from prior work.
In this short note we consider a dynamic assortment planning problem under the capacitated multinomial logit (MNL) bandit model. We prove a tight lower bound on the accumulated regret that matches existing regret upper bounds for all parameters (time horizon $T$, number of items $N$ and maximum assortment capacity $K$) up to logarithmic factors. Our results close an $O(\sqrt{K})$ gap between upper and lower regret bounds from existing works.