LGSep 19, 2025

The Multi-Query Paradox in Zeroth-Order Optimization

arXiv:2509.15552v22 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work addresses a fundamental trade-off in optimization for scenarios where gradients are unavailable, such as in black-box machine learning, though it is incremental in refining existing paradigms.

The paper tackles the problem of query allocation in zeroth-order optimization, where using multiple queries per iteration to reduce gradient estimation variance conflicts with the total number of iterations under a fixed query budget. It resolves this by showing that for simple averaging, single-query is optimal, while a new projection alignment method performs better with more queries, leading to full-subspace estimation as optimal.

Zeroth-order (ZO) optimization provides a powerful framework for problems where explicit gradients are unavailable and have to be approximated using only queries to function value. The prevalent single-query approach is simple, but suffers from high estimation variance, motivating a multi-query paradigm to improves estimation accuracy. This, however, creates a critical trade-off: under a fixed budget of queries (i.e. cost), queries per iteration and the total number of optimization iterations are inversely proportional to one another. How to best allocate this budget is a fundamental, under-explored question. This work systematically resolves this query allocation problem. We analyze two aggregation methods: the de facto simple averaging (ZO-Avg), and a new Projection Alignment method (ZO-Align) we derive from local surrogate minimization. By deriving convergence rates for both methods that make the dependence on the number of queries explicit across strongly convex, convex, non-convex, and stochastic settings, we uncover a stark dichotomy: For ZO-Avg, we prove that using more than one query per iteration is always query-inefficient, rendering the single-query approach optimal. On the contrary, ZO-Align generally performs better with more queries per iteration, resulting in a full-subspace estimation as the optimal approach. Thus, our work clarifies that the multi-query problem boils down to a choice not about an intermediate query size, but between two classic algorithms, a choice dictated entirely by the aggregation method used. These theoretical findings are also consistently validated by extensive experiments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes