LGMLJun 13, 2023

Multi-Fidelity Multi-Armed Bandits Revisited

arXiv:2306.07761v18 citationsh-index: 64
Originality Incremental advance
AI Analysis

This work addresses a theoretical extension of bandit problems for researchers in machine learning, but it is incremental as it builds on existing multi-armed bandit frameworks.

The paper tackles the multi-fidelity multi-armed bandit problem by studying best arm identification and regret minimization, presenting lower bounds, algorithmic frameworks, and algorithms that match these bounds up to logarithmic terms, such as a worst-cost regret upper bound aligning with a lower bound of Ω(K^{1/3}Λ^{2/3}).

We study the multi-fidelity multi-armed bandit (MF-MAB), an extension of the canonical multi-armed bandit (MAB) problem. MF-MAB allows each arm to be pulled with different costs (fidelities) and observation accuracy. We study both the best arm identification with fixed confidence (BAI) and the regret minimization objectives. For BAI, we present (a) a cost complexity lower bound, (b) an algorithmic framework with two alternative fidelity selection procedures, and (c) both procedures' cost complexity upper bounds. From both cost complexity bounds of MF-MAB, one can recover the standard sample complexity bounds of the classic (single-fidelity) MAB. For regret minimization of MF-MAB, we propose a new regret definition, prove its problem-independent regret lower bound $Ω(K^{1/3}Λ^{2/3})$ and problem-dependent lower bound $Ω(K\log Λ)$, where $K$ is the number of arms and $Λ$ is the decision budget in terms of cost, and devise an elimination-based algorithm whose worst-cost regret upper bound matches its corresponding lower bound up to some logarithmic terms and, whose problem-dependent bound matches its corresponding lower bound in terms of $Λ$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes