81.9CEJun 1Code
ZOAF: Towards Efficient Zeroth-Order Optimization for Analog/RF Circuit DesignLiyan Tan, Yequan Zhao, Jinming Lu et al.
Circuit optimization is an indispensable step in analog/RF IC design. Classical fast gradient-based optimization methods are typically infeasible due to lack of access to simulator source code and the technical barriers to implementing adjoint methods. Therefore, surrogate-based black-box optimization is widely used in practice; however, it can be costly to build and sensitive to hyperparameters, whereas population heuristics often suffer from slow convergence and large evaluation counts under tight simulator-call budgets. To address these limitations, we propose the Zeroth-Order Analog/RF Framework (ZOAF), which recovers gradient-descent directions from a small number of black-box circuit simulations, combining the benefits of both gradient-based optimization and black-box optimization. We also employ several surrogate-free techniques to improve the efficiency and accuracy, including (1) a hybrid ZO scheduling method that switches between random-direction ZO for budget-efficient exploration and coordinate-wise ZO for accurate late-stage refinement, (2) one-shot quasi-random multi-start to focus evaluations, and (3) a sliding-window monitor that triggers early stops and box-projected updates to maintain feasibility. Evaluated on three distinct schematics, ZOAF consistently outperforms state-of-the-art baselines, achieving the best median final value on every reported figure of merit -- with up to an order-of-magnitude advantage in median peaking on the 22-parameter two-stage amplifier -- together with the most robust worst-case behavior across seeds, while reducing simulator calls to convergence by $1.3$--$3.8\times$. Code is publicly available at https://github.com/LiyanTan111/ZOAF.
67.5LGJun 1
GRZO: Group-Relative Zeroth-Order Optimization for Large Language Model Fine-TuningLiyan Tan, Yequan Zhao, Yifan Yang et al.
Zeroth-order (ZO) optimization is a memory-efficient alternative to backpropagation for fine-tuning large language models, but its deployment is limited by the high variance of gradient estimation. We propose GRZO, a Group-Relative Zeroth-Order optimizer that draws one pseudo-independent perturbation per mini-batch example and aggregates the per-example losses through group-relative normalization, raising the effective gradient-direction count from one to the batch size at no additional forward cost while preserving inference-level memory. We prove that GRZO is directionally unbiased with variance shrinking proportionally to the batch size, yielding a tighter nonconvex convergence bound than MeZO. Across RoBERTa-large, Llama3-8B, and OPT-13B over multiple tasks, GRZO improves average accuracy on Llama3-8B by $+3.0$ over MeZO at $23\%$ lower peak GPU memory; as a drop-in replacement for the MeZO core, it lifts sparse, low-rank, and quantized ZO variants by $+6.0$ on average.
91.6LGMay 19Code
FuRA: Full-Rank Parameter-Efficient Fine-Tuning with Spectral PreconditioningYequan Zhao, Ruijie Zhang, Liyan Tan et al.
Both full fine-tuning (Full FT) and parameter-efficient fine-tuning methods such as LoRA introduce weight updates without accounting for the spectral structure established during pretraining. As a result, noisy gradients from limited fine-tuning data can perturb robust pretrained features. We identify spectral preconditioning as the missing ingredient: reparameterizing each weight matrix through its full-rank singular value decomposition (SVD) and freezing one singular basis constrains updates to the pretrained column space, yielding a preconditioned optimization scheme that outperforms unconstrained Full FT at the same trainable parameter count. Building on this insight, we propose FuRA (Full-Rank Adaptation), an efficient full-rank adaptation framework based on a block tensor-train factorization W = LSR, where the large core L is fixed to the pretrained block-wise SVD basis, while only the compact core R and the block-wise singular values S are optimized. This design simultaneously provides full-rank spectral preconditioning, preserves full-rank update expressivity, and achieves parameter, memory, and step-time efficiency comparable to LoRA. FuRA consistently outperforms Full FT across multiple settings, including LLM fine-tuning (+1.37 on LLaMA-3-8B commonsense reasoning), LLM reinforcement learning for mathematical reasoning, and visual instruction tuning for VLMs. Furthermore, the 4-bit quantized variant, QFuRA, also surpasses QLoRA. Code is available at https://github.com/olokevin/FuRA-NIPS