ST MLOct 23, 2020

Finite Continuum-Armed Bandits

arXiv:2010.12236v22.32 citations

Originality Incremental advance

AI Analysis

This work addresses resource allocation in nonparametric bandit settings with side information, providing theoretical guarantees for optimal regret scaling, which is incremental but offers specific improvements in handling covariate-dependent rewards.

The paper tackles the problem of allocating a limited budget T across N actions with unknown stochastic rewards and one-dimensional covariates, proposing an optimal strategy that achieves a regret scaling of O(T^{1/3}) when T is proportional to N, with a smooth transition to O(T^{1/2}) as T becomes smaller relative to N.

We consider a situation where an agent has $T$ ressources to be allocated to a larger number $N$ of actions. Each action can be completed at most once and results in a stochastic reward with unknown mean. The goal of the agent is to maximize her cumulative reward. Non trivial strategies are possible when side information on the actions is available, for example in the form of covariates. Focusing on a nonparametric setting, where the mean reward is an unknown function of a one-dimensional covariate, we propose an optimal strategy for this problem. Under natural assumptions on the reward function, we prove that the optimal regret scales as $O(T^{1/3})$ up to poly-logarithmic factors when the budget $T$ is proportional to the number of actions $N$. When $T$ becomes small compared to $N$, a smooth transition occurs. When the ratio $T/N$ decreases from a constant to $N^{-1/3}$, the regret increases progressively up to the $O(T^{1/2})$ rate encountered in continuum-armed bandits.

View on arXiv PDF

Similar