LGMLMar 10, 2023

A General Recipe for the Analysis of Randomized Multi-Armed Bandit Algorithms

arXiv:2303.06058v36 citationsh-index: 23
Originality Incremental advance
AI Analysis

This work provides a general framework for analyzing bandit algorithms, which is incremental as it builds on existing methods to unify and extend regret analysis across different distribution models.

The authors tackled the problem of deriving regret bounds for randomized multi-armed bandit algorithms by proposing a general methodology based on checking sufficient conditions on sampling probabilities and distributions, and applied it to prove asymptotic optimality for MED and analyze TS algorithms across various distribution models, including a new non-parametric TS algorithm for unbounded distributions with bounded h-moments.

In this paper we propose a general methodology to derive regret bounds for randomized multi-armed bandit algorithms. It consists in checking a set of sufficient conditions on the sampling probability of each arm and on the family of distributions to prove a logarithmic regret. As a direct application we revisit two famous bandit algorithms, Minimum Empirical Divergence (MED) and Thompson Sampling (TS), under various models for the distributions including single parameter exponential families, Gaussian distributions, bounded distributions, or distributions satisfying some conditions on their moments. In particular, we prove that MED is asymptotically optimal for all these models, but also provide a simple regret analysis of some TS algorithms for which the optimality is already known. We then further illustrate the interest of our approach, by analyzing a new Non-Parametric TS algorithm (h-NPTS), adapted to some families of unbounded reward distributions with a bounded h-moment. This model can for instance capture some non-parametric families of distributions whose variance is upper bounded by a known constant.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes