Batched Stochastic Bandit for Nondegenerate Functions
This work addresses communication efficiency in bandit learning for nondegenerate functions, offering a near-optimal solution with minimal batches, which is incremental but practically relevant for distributed or resource-constrained settings.
The paper tackles the batched bandit problem for nondegenerate functions by introducing the Geometric Narrowing algorithm, which achieves a regret bound of order $\widetilde{\mathcal{O}}(A_{+}^d \sqrt{T})$ using only $\mathcal{O}(\log \log T)$ batches, and provides lower bounds showing this is near-optimal.
This paper studies batched bandit learning problems for nondegenerate functions. We introduce an algorithm that solves the batched bandit problem for nondegenerate functions near-optimally. More specifically, we introduce an algorithm, called Geometric Narrowing (GN), whose regret bound is of order $\widetilde{\mathcal{O}} ( A_{+}^d \sqrt{T} )$. In addition, GN only needs $\mathcal{O} (\log \log T)$ batches to achieve this regret. We also provide lower bound analysis for this problem. More specifically, we prove that over some (compact) doubling metric space of doubling dimension $d$: 1. For any policy $π$, there exists a problem instance on which $π$ admits a regret of order $Ω ( A_-^d \sqrt{T})$; 2. No policy can achieve a regret of order $ A_-^d \sqrt{T} $ over all problem instances, using less than $ Ω( \log \log T ) $ rounds of communications. Our lower bound analysis shows that the GN algorithm achieves near optimal regret with minimal number of batches.