LG DSSep 6, 2023

The Best Arm Evades: Near-optimal Multi-pass Streaming Lower Bounds for Pure Exploration in Multi-armed Bandits

arXiv:2309.03145v26.65 citationsh-index: 28

Originality Incremental advance

AI Analysis

This addresses a fundamental trade-off in streaming algorithms for multi-armed bandits, solving an open problem in theoretical computer science, though it is incremental as it builds on existing work.

The paper tackles the problem of pure exploration in multi-armed bandits with streaming algorithms, showing that any algorithm using optimal sample complexity and sublinear memory requires a near-optimal number of passes, specifically Ω(log(1/Δ)/log log(1/Δ)), matching prior algorithmic results up to lower-order terms.

We give a near-optimal sample-pass trade-off for pure exploration in multi-armed bandits (MABs) via multi-pass streaming algorithms: any streaming algorithm with sublinear memory that uses the optimal sample complexity of $O(\frac{n}{Δ^2})$ requires $Ω(\frac{\log{(1/Δ)}}{\log\log{(1/Δ)}})$ passes. Here, $n$ is the number of arms and $Δ$ is the reward gap between the best and the second-best arms. Our result matches the $O(\log(\frac{1}Δ))$-pass algorithm of Jin et al. [ICML'21] (up to lower order terms) that only uses $O(1)$ memory and answers an open question posed by Assadi and Wang [STOC'20].

View on arXiv PDF

Similar