LGOct 12, 2021

Query-Reward Tradeoffs in Multi-Armed Bandits

Nadav Merlis, Yonathan Efroni, Shie Mannor

arXiv:2110.05724v21.6

Originality Highly original

AI Analysis

This work addresses a resource-constrained variant of multi-armed bandits, offering theoretical guarantees for scenarios like online advertising or clinical trials where querying rewards incurs costs, representing an incremental advance in bandit theory.

The paper tackles the problem of balancing regret and query costs in stochastic multi-armed bandits where rewards must be actively queried to be observed, providing tight lower and upper bounds on both regret and query numbers, with results showing a fundamental difference between problems with unique versus multiple optimal arms.

We consider a stochastic multi-armed bandit setting where reward must be actively queried for it to be observed. We provide tight lower and upper problem-dependent guarantees on both the regret and the number of queries. Interestingly, we prove that there is a fundamental difference between problems with a unique and multiple optimal arms, unlike in the standard multi-armed bandit problem. We also present a new, simple, UCB-style sampling concept, and show that it naturally adapts to the number of optimal arms and achieves tight regret and querying bounds.

View on arXiv PDF

Similar