MLLGSep 15, 2023

Price of Safety in Linear Best Arm Identification

arXiv:2309.08709v15 citationsh-index: 5
Originality Incremental advance
AI Analysis

This addresses safety-critical decision-making in applications like clinical trials or robotics, though it is incremental as it extends existing linear bandit methods to the best-arm identification setting.

The paper tackles the problem of safe best-arm identification with linear feedback, where an agent must satisfy stage-wise safety constraints while identifying the best arm, and shows that safety constraints increase sample complexity by an extra term due to forced exploration.

We introduce the safe best-arm identification framework with linear feedback, where the agent is subject to some stage-wise safety constraint that linearly depends on an unknown parameter vector. The agent must take actions in a conservative way so as to ensure that the safety constraint is not violated with high probability at each round. Ways of leveraging the linear structure for ensuring safety has been studied for regret minimization, but not for best-arm identification to the best our knowledge. We propose a gap-based algorithm that achieves meaningful sample complexity while ensuring the stage-wise safety. We show that we pay an extra term in the sample complexity due to the forced exploration phase incurred by the additional safety constraint. Experimental illustrations are provided to justify the design of our algorithm.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes