Bandit Pareto Set Identification in a Multi-Output Linear Model
This work addresses the problem of efficiently identifying non-dominated arms in multi-output bandit settings for applications like recommendation systems, though it appears incremental as it builds on existing bandit and Pareto set frameworks.
The paper tackles the Pareto Set Identification problem in a multi-output linear bandit model by developing optimal design-based algorithms, achieving nearly optimal guarantees in fixed-budget and fixed-confidence settings, with results showing that task difficulty depends on the sub-optimality gaps of only h arms.
We study the Pareto Set Identification (PSI) problem in a structured multi-output linear bandit model. In this setting, each arm is associated a feature vector belonging to $\mathbb{R}^h$, and its mean vector in $\mathbb{R}^d$ linearly depends on this feature vector through a common unknown matrix $Θ\in \mathbb{R}^{h \times d}$. The goal is to identify the set of non-dominated arms by adaptively collecting samples from the arms. We introduce and analyze the first optimal design-based algorithms for PSI, providing nearly optimal guarantees in both the fixed-budget and the fixed-confidence settings. Notably, we show that the difficulty of these tasks mainly depends on the sub-optimality gaps of $h$ arms only. Our theoretical results are supported by an extensive benchmark on synthetic and real-world datasets.