LGSep 19, 2025

The Multi-Query Paradox in Zeroth-Order Optimization

arXiv:2509.15552v22 citationsh-index: 2

Originality Incremental advance

AI Analysis

This work addresses a fundamental trade-off in optimization for scenarios where gradients are unavailable, such as in black-box machine learning, though it is incremental in refining existing paradigms.

The paper tackles the problem of query allocation in zeroth-order optimization, where using multiple queries per iteration to reduce gradient estimation variance conflicts with the total number of iterations under a fixed query budget. It resolves this by showing that for simple averaging, single-query is optimal, while a new projection alignment method performs better with more queries, leading to full-subspace estimation as optimal.

Zeroth-order (ZO) optimization provides a powerful framework for problems where explicit gradients are unavailable and have to be approximated using only queries to function value. The prevalent single-query approach is simple, but suffers from high estimation variance, motivating a multi-query paradigm to improves estimation accuracy. This, however, creates a critical trade-off: under a fixed budget of queries (i.e. cost), queries per iteration and the total number of optimization iterations are inversely proportional to one another. How to best allocate this budget is a fundamental, under-explored question. This work systematically resolves this query allocation problem. We analyze two aggregation methods: the de facto simple averaging (ZO-Avg), and a new Projection Alignment method (ZO-Align) we derive from local surrogate minimization. By deriving convergence rates for both methods that make the dependence on the number of queries explicit across strongly convex, convex, non-convex, and stochastic settings, we uncover a stark dichotomy: For ZO-Avg, we prove that using more than one query per iteration is always query-inefficient, rendering the single-query approach optimal. On the contrary, ZO-Align generally performs better with more queries per iteration, resulting in a full-subspace estimation as the optimal approach. Thus, our work clarifies that the multi-query problem boils down to a choice not about an intermediate query size, but between two classic algorithms, a choice dictated entirely by the aggregation method used. These theoretical findings are also consistently validated by extensive experiments.

View on arXiv PDF

Similar