LGAIMLMar 6, 2023

Thompson Sampling for Linear Bandit Problems with Normal-Gamma Priors

arXiv:2303.03348v1h-index: 8
AI Analysis

This work addresses bandit optimization for scenarios with uncertain variance, but it is incremental as it extends prior methods with specific priors.

The paper tackles the linear bandit problem with unknown variance by using Thompson sampling with normal-gamma priors, deriving a Bayesian regret bound under a moment condition.

We consider Thompson sampling for linear bandit problems with finitely many independent arms, where rewards are sampled from normal distributions that are linearly dependent on unknown parameter vectors and with unknown variance. Specifically, with a Bayesian formulation we consider multivariate normal-gamma priors to represent environment uncertainty for all involved parameters. We show that our chosen sampling prior is a conjugate prior to the reward model and derive a Bayesian regret bound for Thompson sampling under the condition that the 5/2-moment of the variance distribution exist.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes