Stochastic Bandits with Vector Losses: Minimizing $\ell^\infty$-Norm of Relative Losses
This work addresses multi-objective decision-making in scenarios like recommender systems, where multiple factors beyond click rates must be optimized, but it is incremental as it extends existing bandit frameworks to vector losses.
The paper tackles the problem of multi-armed bandits with multiple losses by defining relative loss vectors and studying two goals: identifying the arm with minimum ℓ∞-norm of relative losses with given confidence, and minimizing the ℓ∞-norm of cumulative relative losses. For the first goal, they derive a sample complexity lower bound and matching algorithms, and for the second, they provide a regret lower bound of Ω(T^{2/3}) and a matching algorithm.
Multi-armed bandits are widely applied in scenarios like recommender systems, for which the goal is to maximize the click rate. However, more factors should be considered, e.g., user stickiness, user growth rate, user experience assessment, etc. In this paper, we model this situation as a problem of $K$-armed bandit with multiple losses. We define relative loss vector of an arm where the $i$-th entry compares the arm and the optimal arm with respect to the $i$-th loss. We study two goals: (a) finding the arm with the minimum $\ell^\infty$-norm of relative losses with a given confidence level (which refers to fixed-confidence best-arm identification); (b) minimizing the $\ell^\infty$-norm of cumulative relative losses (which refers to regret minimization). For goal (a), we derive a problem-dependent sample complexity lower bound and discuss how to achieve matching algorithms. For goal (b), we provide a regret lower bound of $Ω(T^{2/3})$ and provide a matching algorithm.