Discrete Choice Multi-Armed Bandits
This work addresses the problem of designing flexible and efficient bandit algorithms for decision-making under uncertainty, but it appears incremental as it builds on existing discrete choice models and algorithms like Exp3.
The paper connects discrete choice models to online learning and multi-armed bandits, providing sublinear regret bounds for a family of algorithms and introducing novel adversarial bandit algorithms based on generalized nested logit models, with numerical experiments in the stochastic case.
This paper establishes a connection between a category of discrete choice models and the realms of online learning and multiarmed bandit algorithms. Our contributions can be summarized in two key aspects. Firstly, we furnish sublinear regret bounds for a comprehensive family of algorithms, encompassing the Exp3 algorithm as a particular case. Secondly, we introduce a novel family of adversarial multiarmed bandit algorithms, drawing inspiration from the generalized nested logit models initially introduced by \citet{wen:2001}. These algorithms offer users the flexibility to fine-tune the model extensively, as they can be implemented efficiently due to their closed-form sampling distribution probabilities. To demonstrate the practical implementation of our algorithms, we present numerical experiments, focusing on the stochastic bandit case.