LGJan 31, 2025

Elucidating Subspace Perturbation in Zeroth-Order Optimization: Theory and Practice at Scale

arXiv:2501.19099v22 citationsh-index: 6
Originality Incremental advance
AI Analysis

This work addresses the problem of slow convergence in zeroth-order optimization for black-box tasks like LLM fine-tuning, offering a practical speedup but being incremental in nature.

The paper tackled the slow convergence of zeroth-order optimization methods by analyzing subspace perturbations, showing that high dimensionality is the bottleneck and introducing subspace alignment to reduce gradient noise, resulting in up to 2.77x speedup in wall-clock time over MeZO on OPT-13B.

Zeroth-order (ZO) optimization has emerged as a promising alternative to gradient-based backpropagation methods, particularly for black-box optimization and large language model (LLM) fine-tuning. However, ZO methods often suffer from slow convergence due to high-variance stochastic gradient estimators. While subspace perturbations, such as sparsity and low-rank constraints, have been explored to mitigate this issue, their effectiveness remains poorly understood. In this work, we develop a \emph{unified theoretical framework} that analyzes both the convergence and generalization properties of ZO optimization under subspace perturbations. We show that high dimensionality is the primary bottleneck and introduce the notion of \textit{subspace alignment} to explain how the subspace perturbations reduce gradient noise and accelerate convergence. Our analysis further shows that a broad class of subspace perturbations exhibits a similar convergence rate, motivating us to prioritize practical considerations in real-world algorithm design. Building on these insights, we propose an efficient ZO method using block coordinate descent (MeZO-BCD), which perturbs and updates only a subset of parameters at each step. Extensive experiments show that MeZO-BCD significantly accelerates optimization, achieving up to $\mathbf{\times2.77}$ speedup in wall-clock time over MeZO on OPT-13B, while maintaining comparable iteration complexity and fine-tuning performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes