LGMLNov 3, 2019

Zeroth Order Non-convex optimization with Dueling-Choice Bandits

arXiv:1911.00980v117 citations
Originality Incremental advance
AI Analysis

This work addresses optimization challenges in scenarios with limited feedback, such as hyperparameter tuning or black-box functions, by leveraging comparisons to reduce search space, though it is incremental as it builds on existing GP-UCB methods.

The paper tackles the problem of zeroth-order non-convex optimization by introducing a setting where both direct function queries and duels (comparisons between two points) are available, and it presents the COMP-GP-UCB algorithm, which achieves a theoretical simple regret bound of O(Φ/√T) with an improved information gain Φ compared to direct-query-only methods.

We consider a novel setting of zeroth order non-convex optimization, where in addition to querying the function value at a given point, we can also duel two points and get the point with the larger function value. We refer to this setting as optimization with dueling-choice bandits since both direct queries and duels are available for optimization. We give the COMP-GP-UCB algorithm based on GP-UCB (Srinivas et al., 2009), where instead of directly querying the point with the maximum Upper Confidence Bound (UCB), we perform a constrained optimization and use comparisons to filter out suboptimal points. COMP-GP-UCB comes with theoretical guarantee of $O(\fracΦ{\sqrt{T}})$ on simple regret where $T$ is the number of direct queries and $Φ$ is an improved information gain corresponding to a comparison based constraint set that restricts the search space for the optimum. In contrast, in the direct query only setting, $Φ$ depends on the entire domain. Finally, we present experimental results to show the efficacy of our algorithm.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes