Streaming Algorithms for Stochastic Multi-armed Bandits
This work provides theoretical bounds and algorithms for streaming multi-armed bandit problems under memory constraints, which is relevant for resource-constrained decision-making systems.
This paper addresses the Stochastic Multi-armed Bandit problem with bounded arm-memory, where arms arrive in a stream and only memory-resident arms can be pulled. For regret minimization, they establish an almost tight hardness of Ω(T^{2/3}) cumulative regret for an arm-memory size of (n-1). For best-arm identification, they propose an O(r) arm-memory r-round adaptive streaming algorithm that matches the lower bound on sample complexity, and a heuristic achieving optimal sample complexity by storing one extra arm.
We study the Stochastic Multi-armed Bandit problem under bounded arm-memory. In this setting, the arms arrive in a stream, and the number of arms that can be stored in the memory at any time, is bounded. The decision-maker can only pull arms that are present in the memory. We address the problem from the perspective of two standard objectives: 1) regret minimization, and 2) best-arm identification. For regret minimization, we settle an important open question by showing an almost tight hardness. We show Ω(T^{2/3}) cumulative regret in expectation for arm-memory size of (n-1), where n is the number of arms. For best-arm identification, we study two algorithms. First, we present an O(r) arm-memory r-round adaptive streaming algorithm to find an ε-best arm. In r-round adaptive streaming algorithm for best-arm identification, the arm pulls in each round are decided based on the observed outcomes in the earlier rounds. The best-arm is the output at the end of r rounds. The upper bound on the sample complexity of our algorithm matches with the lower bound for any r-round adaptive streaming algorithm. Secondly, we present a heuristic to find the ε-best arm with optimal sample complexity, by storing only one extra arm in the memory.