LGDSJun 16, 2025

Stochastic Multi-Objective Multi-Armed Bandits: Regret Definition and Algorithm

arXiv:2506.13125v1h-index: 19
Originality Incremental advance
AI Analysis

This work addresses the challenge of balancing multiple conflicting objectives in online optimization tasks, such as recommendation systems or resource allocation, with incremental improvements in regret metrics and algorithm design.

The paper tackles the problem of multi-objective multi-armed bandits (MO-MAB) by proposing a novel regret metric to address limitations in existing Pareto regret metrics, and introduces a two-phase algorithm that achieves sublinear regret for Pareto-optimal and efficient Pareto-optimal arms.

Multi-armed bandit (MAB) problems are widely applied to online optimization tasks that require balancing exploration and exploitation. In practical scenarios, these tasks often involve multiple conflicting objectives, giving rise to multi-objective multi-armed bandits (MO-MAB). Existing MO-MAB approaches predominantly rely on the Pareto regret metric introduced in \cite{drugan2013designing}. However, this metric has notable limitations, particularly in accounting for all Pareto-optimal arms simultaneously. To address these challenges, we propose a novel and comprehensive regret metric that ensures balanced performance across conflicting objectives. Additionally, we introduce the concept of \textit{Efficient Pareto-Optimal} arms, which are specifically designed for online optimization. Based on our new metric, we develop a two-phase MO-MAB algorithm that achieves sublinear regret for both Pareto-optimal and efficient Pareto-optimal arms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes