Fixed-Budget Constrained Best Arm Identification in Grouped Bandits
This work addresses constrained decision-making in multi-attribute bandits, which is incremental but important for applications like resource allocation or recommendation systems.
The paper tackles the problem of identifying the best feasible arm in grouped bandits under a fixed budget, where arms have multiple attributes and must meet threshold constraints. It proposes the FCSR algorithm, which achieves optimal error probability bounds and outperforms baselines empirically while ensuring feasibility.
We study fixed budget constrained best-arm identification in grouped bandits, where each arm consists of multiple independent attributes with stochastic rewards. An arm is considered feasible only if all its attributes' means are above a given threshold. The aim is to find the feasible arm with the largest overall mean. We first derive a lower bound on the error probability for any algorithm on this setting. We then propose Feasibility Constrained Successive Rejects (FCSR), a novel algorithm that identifies the best arm while ensuring feasibility. We show it attains optimal dependence on problem parameters up to constant factors in the exponent. Empirically, FCSR outperforms natural baselines while preserving feasibility guarantees.