LG MLMay 9, 2024

Imprecise Multi-Armed Bandits

arXiv:2405.05673v11 citations

Originality Incremental advance

AI Analysis

This addresses uncertainty modeling in decision-making for applications like reinforcement learning, but it is incremental as it extends existing bandit frameworks with imprecise probabilities.

The paper tackles the problem of multi-armed bandits with imprecise probabilities by introducing a framework where arms are associated with credal sets, defining regret based on lower previsions, and modeling it as a two-player game. It proposes an algorithm for certain hypothesis classes, proves an upper bound on regret, and provides lower bounds for special cases.

We introduce a novel multi-armed bandit framework, where each arm is associated with a fixed unknown credal set over the space of outcomes (which can be richer than just the reward). The arm-to-credal-set correspondence comes from a known class of hypotheses. We then define a notion of regret corresponding to the lower prevision defined by these credal sets. Equivalently, the setting can be regarded as a two-player zero-sum game, where, on each round, the agent chooses an arm and the adversary chooses the distribution over outcomes from a set of options associated with this arm. The regret is defined with respect to the value of game. For certain natural hypothesis classes, loosely analgous to stochastic linear bandits (which are a special case of the resulting setting), we propose an algorithm and prove a corresponding upper bound on regret. We also prove lower bounds on regret for particular special cases.

View on arXiv PDF

Similar