MLLGDec 3, 2024

An Information-Theoretic Analysis of Thompson Sampling for Logistic Bandits

arXiv:2412.02861v21 citationsh-index: 28
Originality Incremental advance
AI Analysis

This provides a theoretical analysis for logistic bandits, offering tighter regret bounds that are incremental improvements over prior work in this specific domain.

The paper tackles the performance of Thompson Sampling for logistic bandit problems, establishing an improved information ratio bound of 9/2 * d * α^{-2} and deriving a Bayesian expected regret bound of O(d/α * sqrt(T * log(βT/d))), which depends only logarithmically on β and is independent of the number of actions.

We study the performance of the Thompson Sampling algorithm for logistic bandit problems. In this setting, an agent receives binary rewards with probabilities determined by a logistic function, $\exp(β\langle a, θ\rangle)/(1+\exp(β\langle a, θ\rangle))$, with slope parameter $β>0$, and where both the action $a\in \mathcal{A}$ and parameter $θ\in \mathcal{O}$ lie within the $d$-dimensional unit ball. Adopting the information-theoretic framework introduced by Russo and Van Roy (2016), we analyze the information ratio, a statistic that quantifies the trade-off between the immediate regret incurred and the information gained about the optimal action. We improve upon previous results by establishing that the information ratio is bounded by $\tfrac{9}{2}dα^{-2}$, where $α$ is a minimax measure of the alignment between the action space $\mathcal{A}$ and the parameter space $\mathcal{O}$, and is independent of $β$. Using this result, we derive a bound of order $O(d/α\sqrt{T \log(βT/d)})$ on the Bayesian expected regret of Thompson Sampling incurred after $T$ time steps. To our knowledge, this is the first regret bound for logistic bandits that depends only logarithmically on $β$ while being independent of the number of actions. In particular, when the action space contains the parameter space, the bound on the expected regret is of order $\tilde{O}(d \sqrt{T})$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes