LGMay 11

Compander-Aligned Query Geometry for Quantized Zeroth-Order Optimization

arXiv:2605.1067386.6
AI Analysis

For practitioners of memory-efficient zeroth-order optimization, this work provides a principled fix to a previously overlooked quantization artifact, enabling better fine-tuning with low-bit forward passes.

The paper identifies that quantized zeroth-order optimization suffers from a query geometry mismatch due to nonuniform quantization, and proposes CAQ-ZO which aligns queries with the compander's grid to eliminate endpoint-rounding residuals. The method achieves zero query-time residual theoretically and improves fine-tuning performance on NF4 Qwen/Llama models under the same evaluation budget.

Low-bit forward evaluation is an attractive route to memory-efficient zeroth-order (ZO) adaptation: the optimizer needs only scalar losses, and the model can be queried near deployment precision. The obstacle is that a quantized ZO query is not a continuous finite difference followed by harmless storage rounding. The query chooses endpoints, the low-precision engine rounds them, and the loss difference is measured along the rounded chord. For nonuniform companding quantizers, this makes the codebook insufficient to predict ZO behavior: a fixed weight-space radius can collapse in dense cells, over-span sparse cells, or assign a rounded chord to an unrounded update direction. We identify the missing object as query geometry and model scalar nonuniform quantization as $Q = ϕ^{-1} \circ U \circ ϕ$. CAQ-ZO (Compander-Aligned Queries for Zeroth-Order Optimization) forms one-grid-step Rademacher stencils $z \pm Δr$ in $z = ϕ(x)$, maps endpoints back through $ϕ^{-1}$, and updates in $z$. Our theory proves the grid-span mismatch, decomposes endpoint-rounding estimator residuals, and gives stationarity bounds in which generic off-grid queries retain a $Δ^2/μ^2$ residual channel while CAQ-ZO makes the query-time residual exactly zero. Synthetic experiments isolate this channel, and matched NF4 Qwen/Llama fine-tuning shows that CAQ-ZO improves the trained NF4 baseline under the same quantizer and evaluation budget.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes