LGOct 23, 2024

Optimal Streaming Algorithms for Multi-Armed Bandits

arXiv:2410.17835v116 citationsh-index: 66ICML
Originality Incremental advance
AI Analysis

It addresses efficient decision-making in streaming data for applications like online advertising or recommendation systems, with incremental improvements over prior work.

This paper tackles the problem of identifying top-k arms or the best arm in a streaming multi-armed bandit setting with limited memory, achieving optimal sample complexity of O(n/ε² log(k/δ)) for top-k identification and near-optimal instance-dependent complexity with O(log Δ₂⁻¹) passes for best arm identification.

This paper studies two variants of the best arm identification (BAI) problem under the streaming model, where we have a stream of $n$ arms with reward distributions supported on $[0,1]$ with unknown means. The arms in the stream are arriving one by one, and the algorithm cannot access an arm unless it is stored in a limited size memory. We first study the streaming \eps-$top$-$k$ arms identification problem, which asks for $k$ arms whose reward means are lower than that of the $k$-th best arm by at most $\eps$ with probability at least $1-δ$. For general $\eps \in (0,1)$, the existing solution for this problem assumes $k = 1$ and achieves the optimal sample complexity $O(\frac{n}{\eps^2} \log \frac{1}δ)$ using $O(\log^*(n))$ ($\log^*(n)$ equals the number of times that we need to apply the logarithm function on $n$ before the results is no more than 1.) memory and a single pass of the stream. We propose an algorithm that works for any $k$ and achieves the optimal sample complexity $O(\frac{n}{\eps^2} \log\frac{k}δ)$ using a single-arm memory and a single pass of the stream. Second, we study the streaming BAI problem, where the objective is to identify the arm with the maximum reward mean with at least $1-δ$ probability, using a single-arm memory and as few passes of the input stream as possible. We present a single-arm-memory algorithm that achieves a near instance-dependent optimal sample complexity within $O(\log Δ_2^{-1})$ passes, where $Δ_2$ is the gap between the mean of the best arm and that of the second best arm.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes