MLLGFeb 14, 2025

Improved Online Confidence Bounds for Multinomial Logistic Bandits

arXiv:2502.10020v512 citationsh-index: 4ICML
Originality Highly original
AI Analysis

This work addresses the challenge of dependency on parameter bounds in bandit algorithms for multinomial logistic models, offering improved theoretical guarantees for online learning applications.

The paper tackles the problem of improving regret bounds in multinomial logistic bandits by deriving a tighter online confidence bound, resulting in a variance-dependent optimal regret of O(d log T sqrt(∑ σ_t^2)) and a poly(B)-free regret of O(d log(BT) sqrt(∑ σ_t^2)).

In this paper, we propose an improved online confidence bound for multinomial logistic (MNL) models and apply this result to MNL bandits, achieving variance-dependent optimal regret. Recently, Lee & Oh (2024) established an online confidence bound for MNL models and achieved nearly minimax-optimal regret in MNL bandits. However, their results still depend on the norm-boundedness of the unknown parameter $B$ and the maximum size of possible outcomes $K$. To address this, we first derive an online confidence bound of $O\left(\sqrt{d \log t} + B \sqrt{d} \right)$, which is a significant improvement over the previous bound of $O (B \sqrt{d} \log t \log K )$ (Lee & Oh, 2024). This is mainly achieved by establishing tighter self-concordant properties of the MNL loss and applying Ville's inequality to bound the estimation error. Using this new online confidence bound, we propose a constant-time algorithm, OFU-MNL++, which achieves a variance-dependent regret bound of $O \Big( d \log T \sqrt{ \sum_{t=1}^T σ_t^2 } \Big) $ for sufficiently large $T$, where $σ_t^2$ denotes the variance of the rewards at round $t$, $d$ is the dimension of the contexts, and $T$ is the total number of rounds. Furthermore, we introduce a Maximum Likelihood Estimation (MLE)-based algorithm, OFU-MN$^2$L, which achieves an anytime poly(B)-free regret of $O \Big( d \log (BT) \sqrt{ \sum_{t=1}^T σ_t^2 } \Big) $.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes