LGOct 12, 2021

Query-Reward Tradeoffs in Multi-Armed Bandits

arXiv:2110.05724v2
Originality Highly original
AI Analysis

This work addresses a resource-constrained variant of multi-armed bandits, offering theoretical guarantees for scenarios like online advertising or clinical trials where querying rewards incurs costs, representing an incremental advance in bandit theory.

The paper tackles the problem of balancing regret and query costs in stochastic multi-armed bandits where rewards must be actively queried to be observed, providing tight lower and upper bounds on both regret and query numbers, with results showing a fundamental difference between problems with unique versus multiple optimal arms.

We consider a stochastic multi-armed bandit setting where reward must be actively queried for it to be observed. We provide tight lower and upper problem-dependent guarantees on both the regret and the number of queries. Interestingly, we prove that there is a fundamental difference between problems with a unique and multiple optimal arms, unlike in the standard multi-armed bandit problem. We also present a new, simple, UCB-style sampling concept, and show that it naturally adapts to the number of optimal arms and achieves tight regret and querying bounds.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes