Zeroth Order Non-convex optimization with Dueling-Choice Bandits
This work addresses optimization challenges in scenarios with limited feedback, such as hyperparameter tuning or black-box functions, by leveraging comparisons to reduce search space, though it is incremental as it builds on existing GP-UCB methods.
The paper tackles the problem of zeroth-order non-convex optimization by introducing a setting where both direct function queries and duels (comparisons between two points) are available, and it presents the COMP-GP-UCB algorithm, which achieves a theoretical simple regret bound of O(Φ/√T) with an improved information gain Φ compared to direct-query-only methods.
We consider a novel setting of zeroth order non-convex optimization, where in addition to querying the function value at a given point, we can also duel two points and get the point with the larger function value. We refer to this setting as optimization with dueling-choice bandits since both direct queries and duels are available for optimization. We give the COMP-GP-UCB algorithm based on GP-UCB (Srinivas et al., 2009), where instead of directly querying the point with the maximum Upper Confidence Bound (UCB), we perform a constrained optimization and use comparisons to filter out suboptimal points. COMP-GP-UCB comes with theoretical guarantee of $O(\fracΦ{\sqrt{T}})$ on simple regret where $T$ is the number of direct queries and $Φ$ is an improved information gain corresponding to a comparison based constraint set that restricts the search space for the optimum. In contrast, in the direct query only setting, $Φ$ depends on the entire domain. Finally, we present experimental results to show the efficacy of our algorithm.