LGMar 13, 2025

Multi-thresholding Good Arm Identification with Bandit Feedback

arXiv:2503.10386v3h-index: 17
Originality Incremental advance
AI Analysis

This work addresses a multi-objective decision-making problem for bandit algorithms, but it is incremental as it extends existing single-objective methods to multiple thresholds.

The paper tackles the problem of identifying a good arm in a multi-objective stochastic bandit setting, where arms have reward vectors, and proposes the MultiTUCB algorithm with a sample complexity bound that matches existing results in special cases and shows superior performance on synthetic and real datasets.

We consider a good arm identification problem in a stochastic bandit setting with multi-objectives, where each arm $i \in [K]$ is associated with a distribution $D_i$ defined over $R^M$. For each round $t$, the player pulls an arm $i_t$ and receives an $M$-dimensional reward vector sampled according to $D_{i_t}$. The goal is to find, with high probability, an $ε$-good arm whose expected reward vector is larger than $\bmξ - ε\mathbf{1}$, where $\bmξ$ is a predefined threshold vector, and the vector comparison is component-wise. We propose the Multi-Thresholding UCB~(MultiTUCB) algorithm with a sample complexity bound. Our bound matches the existing one in the special case where $M=1$ and $ε=0$. The proposed algorithm demonstrates superior performance compared to baseline approaches across synthetic and real datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes