LG THOct 4, 2025

Beyond Softmax: A New Perspective on Gradient Bandits

arXiv:2510.03979v14.1h-index: 1

Originality Incremental advance

AI Analysis

This work addresses a problem in online learning and multi-armed bandits by providing more flexible and computationally efficient algorithms, though it appears incremental as it builds on existing gradient bandit methods.

The paper tackles the limitation of softmax-based gradient bandit algorithms by introducing a new framework that relaxes restrictive independence assumptions, allowing for correlated learning dynamics across actions. The result includes sublinear regret bounds for a broad algorithmic family and practical effectiveness demonstrated in numerical experiments.

We establish a link between a class of discrete choice models and the theory of online learning and multi-armed bandits. Our contributions are: (i) sublinear regret bounds for a broad algorithmic family, encompassing Exp3 as a special case; (ii) a new class of adversarial bandit algorithms derived from generalized nested logit models \citep{wen:2001}; and (iii) \textcolor{black}{we introduce a novel class of generalized gradient bandit algorithms that extends beyond the widely used softmax formulation. By relaxing the restrictive independence assumptions inherent in softmax, our framework accommodates correlated learning dynamics across actions, thereby broadening the applicability of gradient bandit methods.} Overall, the proposed algorithms combine flexible model specification with computational efficiency via closed-form sampling probabilities. Numerical experiments in stochastic bandit settings demonstrate their practical effectiveness.

View on arXiv PDF

Similar