ML LGMay 9, 2024

Batched Stochastic Bandit for Nondegenerate Functions

arXiv:2405.05733v37.52 citationsIEEE Trans Inf Theory

Originality Incremental advance

AI Analysis

This work addresses communication efficiency in bandit learning for nondegenerate functions, offering a near-optimal solution with minimal batches, which is incremental but practically relevant for distributed or resource-constrained settings.

The paper tackles the batched bandit problem for nondegenerate functions by introducing the Geometric Narrowing algorithm, which achieves a regret bound of order $\widetilde{\mathcal{O}}(A_{+}^d \sqrt{T})$ using only $\mathcal{O}(\log \log T)$ batches, and provides lower bounds showing this is near-optimal.

This paper studies batched bandit learning problems for nondegenerate functions. We introduce an algorithm that solves the batched bandit problem for nondegenerate functions near-optimally. More specifically, we introduce an algorithm, called Geometric Narrowing (GN), whose regret bound is of order $\widetilde{\mathcal{O}} ( A_{+}^d \sqrt{T} )$. In addition, GN only needs $\mathcal{O} (\log \log T)$ batches to achieve this regret. We also provide lower bound analysis for this problem. More specifically, we prove that over some (compact) doubling metric space of doubling dimension $d$: 1. For any policy $π$, there exists a problem instance on which $π$ admits a regret of order $Ω ( A_-^d \sqrt{T})$; 2. No policy can achieve a regret of order $ A_-^d \sqrt{T} $ over all problem instances, using less than $ Ω( \log \log T ) $ rounds of communications. Our lower bound analysis shows that the GN algorithm achieves near optimal regret with minimal number of batches.

View on arXiv PDF

Similar