LGMLJun 12, 2023

Budgeted Multi-Armed Bandits with Asymmetric Confidence Intervals

arXiv:2306.07071v25 citationsh-index: 5
Originality Highly original
AI Analysis

This addresses the problem of optimizing resource allocation under budget constraints for decision-makers, representing an incremental improvement with a novel method for a known bottleneck.

The paper tackled the stochastic Budgeted Multi-Armed Bandit problem by proposing a new UCB policy with asymmetric confidence intervals, which achieved logarithmic regret and outperformed existing policies in synthetic and real settings.

We study the stochastic Budgeted Multi-Armed Bandit (MAB) problem, where a player chooses from $K$ arms with unknown expected rewards and costs. The goal is to maximize the total reward under a budget constraint. A player thus seeks to choose the arm with the highest reward-cost ratio as often as possible. Current state-of-the-art policies for this problem have several issues, which we illustrate. To overcome them, we propose a new upper confidence bound (UCB) sampling policy, $ω$-UCB, that uses asymmetric confidence intervals. These intervals scale with the distance between the sample mean and the bounds of a random variable, yielding a more accurate and tight estimation of the reward-cost ratio compared to our competitors. We show that our approach has logarithmic regret and consistently outperforms existing policies in synthetic and real settings.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes