Comparisons Are All You Need for Optimizing Smooth Functions
This addresses scenarios like preference-based RL where gradients are unavailable, offering a derivative-free approach, though it is incremental as it matches existing query complexities.
The paper tackles optimization of smooth functions using only pairwise comparisons, showing that for convex functions, algorithms require O~(n/ε) or O~(n²) queries to find an ε-optimal solution, and for nonconvex functions, O~(n/ε²) queries to find an ε-approximate stationary point, matching zeroth-order methods with function evaluations.
When optimizing machine learning models, there are various scenarios where gradient computations are challenging or even infeasible. Furthermore, in reinforcement learning (RL), preference-based RL that only compares between options has wide applications, including reinforcement learning with human feedback in large language models. In this paper, we systematically study optimization of a smooth function $f\colon\mathbb{R}^n\to\mathbb{R}$ only assuming an oracle that compares function values at two points and tells which is larger. When $f$ is convex, we give two algorithms using $\tilde{O}(n/ε)$ and $\tilde{O}(n^{2})$ comparison queries to find an $ε$-optimal solution, respectively. When $f$ is nonconvex, our algorithm uses $\tilde{O}(n/ε^2)$ comparison queries to find an $ε$-approximate stationary point. All these results match the best-known zeroth-order algorithms with function evaluation queries in $n$ dependence, thus suggest that \emph{comparisons are all you need for optimizing smooth functions using derivative-free methods}. In addition, we also give an algorithm for escaping saddle points and reaching an $ε$-second order stationary point of a nonconvex $f$, using $\tilde{O}(n^{1.5}/ε^{2.5})$ comparison queries.