LGAug 29, 2023

Directional Optimism for Safe Linear Bandits

Spencer Hutchinson, Berkay Turan, Mahnoosh Alizadeh

arXiv:2308.15006v211.59 citationsh-index: 23Has Code

Originality Incremental advance

AI Analysis

This work addresses safe decision-making under uncertainty for applications like robotics or healthcare, though it appears incremental as it builds on existing safe bandit frameworks.

The paper tackles the safe linear bandit problem by introducing directional optimism, achieving improved regret guarantees for well-separated instances and finite star convex sets, and proposing a novel algorithm with better empirical performance and matching regret bounds.

The safe linear bandit problem is a version of the classical stochastic linear bandit problem where the learner's actions must satisfy an uncertain constraint at all rounds. Due its applicability to many real-world settings, this problem has received considerable attention in recent years. By leveraging a novel approach that we call directional optimism, we find that it is possible to achieve improved regret guarantees for both well-separated problem instances and action sets that are finite star convex sets. Furthermore, we propose a novel algorithm for this setting that improves on existing algorithms in terms of empirical performance, while enjoying matching regret guarantees. Lastly, we introduce a generalization of the safe linear bandit setting where the constraints are convex and adapt our algorithms and analyses to this setting by leveraging a novel convex-analysis based approach.

View on arXiv PDF Code

Similar