MLLGMEDec 21, 2023

Best Arm Identification in Batched Multi-armed Bandit Problems

arXiv:2312.13875v11 citationsh-index: 1
Originality Incremental advance
AI Analysis

This addresses the challenge of efficient exploration in batch-limited scenarios like biological experimentation and online marketing, but it is incremental as it builds on existing bandit methods.

The paper tackles the problem of best arm identification in batched multi-armed bandits with many arms and few batches, introducing a linear programming framework and a two-stage algorithm that achieves good theoretical properties and competitive performance in numerical studies.

Recently multi-armed bandit problem arises in many real-life scenarios where arms must be sampled in batches, due to limited time the agent can wait for the feedback. Such applications include biological experimentation and online marketing. The problem is further complicated when the number of arms is large and the number of batches is small. We consider pure exploration in a batched multi-armed bandit problem. We introduce a general linear programming framework that can incorporate objectives of different theoretical settings in best arm identification. The linear program leads to a two-stage algorithm that can achieve good theoretical properties. We demonstrate by numerical studies that the algorithm also has good performance compared to certain UCB-type or Thompson sampling methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes