IT AIOct 7, 2025

Risk level dependent Minimax Quantile lower bounds for Interactive Statistical Decision Making

Raghav Bongole, Amirreza Zamani, Tobias J. Oechtering, Mikael Skoglund

arXiv:2510.05808v11.2h-index: 28

Originality Incremental advance

AI Analysis

This work provides a quantile-specific toolkit for analyzing rare failures in interactive protocols, which is incremental but important for safety-critical applications like bandits and reinforcement learning.

The paper addresses the lack of risk level specific quantile bounds in interactive statistical decision making, such as safety-critical bandits, by developing high-probability tools to derive explicit minimax-quantile bounds, which recover optimal-rate bounds for a two-armed Gaussian bandit.

Minimax risk and regret focus on expectation, missing rare failures critical in safety-critical bandits and reinforcement learning. Minimax quantiles capture these tails. Three strands of prior work motivate this study: minimax-quantile bounds restricted to non-interactive estimation; unified interactive analyses that focus on expected risk rather than risk level specific quantile bounds; and high-probability bandit bounds that still lack a quantile-specific toolkit for general interactive protocols. To close this gap, within the interactive statistical decision making framework, we develop high-probability Fano and Le Cam tools and derive risk level explicit minimax-quantile bounds, including a quantile-to-expectation conversion and a tight link between strict and lower minimax quantiles. Instantiating these results for the two-armed Gaussian bandit immediately recovers optimal-rate bounds.

View on arXiv PDF

Similar