LGOct 23, 2024

Optimal Streaming Algorithms for Multi-Armed Bandits

Tianyuan Jin, Keke Huang, Jing Tang, Xiaokui Xiao

arXiv:2410.17835v116.417 citationsh-index: 66ICML

Originality Incremental advance

AI Analysis

It addresses efficient decision-making in streaming data for applications like online advertising or recommendation systems, with incremental improvements over prior work.

This paper tackles the problem of identifying top-k arms or the best arm in a streaming multi-armed bandit setting with limited memory, achieving optimal sample complexity of O(n/ε² log(k/δ)) for top-k identification and near-optimal instance-dependent complexity with O(log Δ₂⁻¹) passes for best arm identification.

This paper studies two variants of the best arm identification (BAI) problem under the streaming model, where we have a stream of $n$ arms with reward distributions supported on $[0,1]$ with unknown means. The arms in the stream are arriving one by one, and the algorithm cannot access an arm unless it is stored in a limited size memory. We first study the streaming \eps-$top$-$k$ arms identification problem, which asks for $k$ arms whose reward means are lower than that of the $k$-th best arm by at most $\eps$ with probability at least $1-δ$. For general $\eps \in (0,1)$, the existing solution for this problem assumes $k = 1$ and achieves the optimal sample complexity $O(\frac{n}{\eps^2} \log \frac{1}δ)$ using $O(\log^*(n))$ ($\log^*(n)$ equals the number of times that we need to apply the logarithm function on $n$ before the results is no more than 1.) memory and a single pass of the stream. We propose an algorithm that works for any $k$ and achieves the optimal sample complexity $O(\frac{n}{\eps^2} \log\frac{k}δ)$ using a single-arm memory and a single pass of the stream. Second, we study the streaming BAI problem, where the objective is to identify the arm with the maximum reward mean with at least $1-δ$ probability, using a single-arm memory and as few passes of the input stream as possible. We present a single-arm-memory algorithm that achieves a near instance-dependent optimal sample complexity within $O(\log Δ_2^{-1})$ passes, where $Δ_2$ is the gap between the mean of the best arm and that of the second best arm.

View on arXiv PDF

Similar