Xuanke Jiang, Sherief Hashima, Kohei Hatano et al.
We consider a good arm identification problem in a stochastic bandit setting with multi-objectives, where each arm $i \in [K]$ is associated with a distribution $D_i$ defined over $R^M$. For each round $t$, the player pulls an arm $i_t$ and receives an $M$-dimensional reward vector sampled according to $D_{i_t}$. The goal is to find, with high probability, an $ε$-good arm whose expected reward vector is larger than $\bmξ - ε\mathbf{1}$, where $\bmξ$ is a predefined threshold vector, and the vector comparison is component-wise. We propose the Multi-Thresholding UCB~(MultiTUCB) algorithm with a sample complexity bound. Our bound matches the existing one in the special case where $M=1$ and $ε=0$. The proposed algorithm demonstrates superior performance compared to baseline approaches across synthetic and real datasets.