Tight Memory-Regret Lower Bounds for Streaming Bandits
This work addresses the fundamental limitations of regret minimization in streaming bandits for algorithms with bounded memory, revealing a separation from centralized stochastic bandits.
The paper establishes tight worst-case and instance-dependent regret lower bounds for streaming bandits with sublinear arm memory, showing an unavoidable double logarithmic factor compared to classical settings, and provides a matching multi-pass algorithm with constant memory.
In this paper, we investigate the streaming bandits problem, wherein the learner aims to minimize regret by dealing with online arriving arms and sublinear arm memory. We establish the tight worst-case regret lower bound of $Ω\left( (TB)^α K^{1-α}\right), α= 2^{B} / (2^{B+1}-1)$ for any algorithm with a time horizon $T$, number of arms $K$, and number of passes $B$. The result reveals a separation between the stochastic bandits problem in the classical centralized setting and the streaming setting with bounded arm memory. Notably, in comparison to the well-known $Ω(\sqrt{KT})$ lower bound, an additional double logarithmic factor is unavoidable for any streaming bandits algorithm with sublinear memory permitted. Furthermore, we establish the first instance-dependent lower bound of $Ω\left(T^{1/(B+1)} \sum_{Δ_x>0} \frac{μ^*}{Δ_x}\right)$ for streaming bandits. These lower bounds are derived through a unique reduction from the regret-minimization setting to the sample complexity analysis for a sequence of $ε$-optimal arms identification tasks, which maybe of independent interest. To complement the lower bound, we also provide a multi-pass algorithm that achieves a regret upper bound of $\tilde{O} \left( (TB)^α K^{1 - α}\right)$ using constant arm memory.