LGDec 11, 2024

Constrained Best Arm Identification in Grouped Bandits

arXiv:2412.08031v12 citationsh-index: 3SPCOM
AI Analysis

This addresses a constrained decision-making problem in bandit settings, which is incremental as it extends existing grouped bandit frameworks with feasibility constraints.

The paper tackles the problem of identifying the best feasible arm in grouped bandits, where arms have multiple attributes with independent stochastic rewards and feasibility requires all attributes to exceed a threshold. It proposes a near-optimal policy with analytical guarantees and compares it to modified action elimination methods via simulations.

We study a grouped bandit setting where each arm comprises multiple independent sub-arms referred to as attributes. Each attribute of each arm has an independent stochastic reward. We impose the constraint that for an arm to be deemed feasible, the mean reward of all its attributes should exceed a specified threshold. The goal is to find the arm with the highest mean reward averaged across attributes among the set of feasible arms in the fixed confidence setting. We first characterize a fundamental limit on the performance of any policy. Following this, we propose a near-optimal confidence interval-based policy to solve this problem and provide analytical guarantees for the policy. We compare the performance of the proposed policy with that of two suitably modified versions of action elimination via simulations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes