LGFeb 12, 2024

Contextual Multinomial Logit Bandits with General Value Functions

arXiv:2402.08126v29.24 citationsh-index: 11NIPS

Originality Incremental advance

AI Analysis

This work addresses a bottleneck in contextual bandit models for real-world recommendation systems, providing incremental improvements in flexibility and performance.

The paper tackles the limitation of prior contextual multinomial logit bandits, which only used linear value functions, by extending them to general value functions for broader applicability in assortment recommendation problems like online retailing and advertising. It proposes algorithms for stochastic and adversarial settings, achieving results that eliminate dependence on a problem-dependent constant and offer advantages like computational efficiency and dimension-free regret bounds.

Contextual multinomial logit (MNL) bandits capture many real-world assortment recommendation problems such as online retailing/advertising. However, prior work has only considered (generalized) linear value functions, which greatly limits its applicability. Motivated by this fact, in this work, we consider contextual MNL bandits with a general value function class that contains the ground truth, borrowing ideas from a recent trend of studies on contextual bandits. Specifically, we consider both the stochastic and the adversarial settings, and propose a suite of algorithms, each with different computation-regret trade-off. When applied to the linear case, our results not only are the first ones with no dependence on a certain problem-dependent constant that can be exponentially large, but also enjoy other advantages such as computational efficiency, dimension-free regret bounds, or the ability to handle completely adversarial contexts and rewards.

View on arXiv PDF

Similar