LG ST MLJun 28, 2023

Allocating Divisible Resources on Arms with Unknown and Random Rewards

arXiv:2306.16578v22.0h-index: 15

Originality Highly original

AI Analysis

This work addresses resource allocation in multi-armed bandit settings with variable feedback noise, which is incremental but provides theoretical insights for decision-making under uncertainty.

The paper tackles the problem of allocating divisible resources to arms with unknown and random rewards, where reward variance scales with resource allocation, and demonstrates a phase transition at b=1/2 with optimal regret bounds for b in [0,1].

We consider a decision maker allocating one unit of renewable and divisible resource in each period on a number of arms. The arms have unknown and random rewards whose means are proportional to the allocated resource and whose variances are proportional to an order $b$ of the allocated resource. In particular, if the decision maker allocates resource $A_i$ to arm $i$ in a period, then the reward $Y_i$ is$Y_i(A_i)=A_i μ_i+A_i^b ξ_{i}$, where $μ_i$ is the unknown mean and the noise $ξ_{i}$ is independent and sub-Gaussian. When the order $b$ ranges from 0 to 1, the framework smoothly bridges the standard stochastic multi-armed bandit and online learning with full feedback. We design two algorithms that attain the optimal gap-dependent and gap-independent regret bounds for $b\in [0,1]$, and demonstrate a phase transition at $b=1/2$. The theoretical results hinge on a novel concentration inequality we have developed that bounds a linear combination of sub-Gaussian random variables whose weights are fractional, adapted to the filtration, and monotonic.

View on arXiv PDF

Similar