MLLGJun 9, 2022

Minimax Optimal Algorithms for Fixed-Budget Best Arm Identification

arXiv:2206.04646v322 citationsh-index: 23
Originality Incremental advance
AI Analysis

This work addresses a theoretical gap in multi-armed bandit algorithms for researchers, though it appears incremental as it builds on known exponential bounds.

The paper tackles the fixed-budget best arm identification problem by characterizing the minimax optimal rate for misidentifying the best arm, introducing two rates (R^go and R^go_∞) and associated algorithms, with R^go-tracking outperforming existing methods.

We consider the fixed-budget best arm identification problem where the goal is to find the arm of the largest mean with a fixed number of samples. It is known that the probability of misidentifying the best arm is exponentially small to the number of rounds. However, limited characterizations have been discussed on the rate (exponent) of this value. In this paper, we characterize the minimax optimal rate as a result of an optimization over all possible parameters. We introduce two rates, $R^{\mathrm{go}}$ and $R^{\mathrm{go}}_{\infty}$, corresponding to lower bounds on the probability of misidentification, each of which is associated with a proposed algorithm. The rate $R^{\mathrm{go}}$ is associated with $R^{\mathrm{go}}$-tracking, which can be efficiently implemented by a neural network and is shown to outperform existing algorithms. However, this rate requires a nontrivial condition to be achievable. To address this issue, we introduce the second rate $R^{\mathrm{go}}_\infty$. We show that this rate is indeed achievable by introducing a conceptual algorithm called delayed optimal tracking (DOT).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes