LGMay 15
Position: Zeroth-Order Optimization in Deep Learning Is Underexplored, Not UnderpoweredSijia Liu, Yicheng Lang, Soumyadeep Pal et al.
Zeroth-order (ZO) optimization, learning from finite differences of function evaluations without backpropagation, has recently regained attention in deep learning due to its memory efficiency and applicability to gray- or black-box pipelines. Yet, ZO methods are often dismissed as fundamentally unscalable because of estimator variance and unfavorable query complexity. We argue that this conclusion might be misguided: ZO optimization is underexplored, not underpowered. We show that many perceived limitations stem from myopic development practices, most notably full-space, element-wise, estimator-centric designs. We articulate six positions spanning the algorithmic, systems, and evaluation stack. First, we revisit the feasibility boundaries of estimator-centric ZO methods through variance control, variance-query tradeoffs, and directional-derivative lenses. Then, we identify three underexplored opportunities: (i) subspace and spectral views of ZO that enable interpretable variance reduction with graceful query scaling, (ii) the forward-only nature of ZO as a systems advantage for communication-efficient, pipeline-friendly, and resource-constrained training, and (iii) the need to de-obfuscate ZO evaluations from task complexity. We strongly advocate rethinking ZO optimization around its unique strengths and acting accordingly, opening a viable path toward large-scale, system-aware, and resource-efficient learning with ZO optimization.
LGApr 5Code
Subspace Control: Turning Constrained Model Steering into Controllable Spectral OptimizationYancheng Huang, Changsheng Wang, Chongyu Fan et al.
Foundation models, such as large language models (LLMs), are powerful but often require customization before deployment to satisfy practical constraints such as safety, privacy, and task-specific requirements, leading to "constrained" optimization problems for model steering and adaptation. However, solving such problems remains largely underexplored and is particularly challenging due to interference between the primary objective and constraint objectives during optimization. In this paper, we propose a subspace control framework for constrained model training. Specifically, (i) we first analyze, from a model merging perspective, how spectral cross-task interference arises and show that it can be resolved via a one-shot solution that orthogonalizes the merged subspace; (ii) we establish a connection between this solution and gradient orthogonalization in the spectral optimizer Muon; and (iii) building on these insights, we introduce SIFT (spectral interference-free training), which leverages a localization scheme to selectively intervene during optimization, enabling controllable updates that mitigate objective-constraint conflicts. We evaluate SIFT across four representative applications: (a) machine unlearning, (b) safety alignment, (c) text-to-speech adaptation, and (d) hallucination mitigation. Compared to both control-based and control-free baselines, SIFT consistently achieves substantial and robust performance improvements across all tasks. Code is available at https://github.com/OPTML-Group/SIFT.
MLMay 3, 2024
Triadic-OCD: Asynchronous Online Change Detection with Provable Robustness, Optimality, and ConvergenceYancheng Huang, Kai Yang, Zelin Zhu et al.
The primary goal of online change detection (OCD) is to promptly identify changes in the data stream. OCD problem find a wide variety of applications in diverse areas, e.g., security detection in smart grids and intrusion detection in communication networks. Prior research usually assumes precise knowledge of the system parameters. Nevertheless, this presumption often proves unattainable in practical scenarios due to factors such as estimation errors, system updates, etc. This paper aims to take the first attempt to develop a triadic-OCD framework with certifiable robustness, provable optimality, and guaranteed convergence. In addition, the proposed triadic-OCD algorithm can be realized in a fully asynchronous distributed manner, easing the necessity of transmitting the data to a single server. This asynchronous mechanism could also mitigate the straggler issue that faced by traditional synchronous algorithm. Moreover, the non-asymptotic convergence property of Triadic-OCD is theoretically analyzed, and its iteration complexity to achieve an $ε$-optimal point is derived. Extensive experiments have been conducted to elucidate the effectiveness of the proposed method.
LGNov 17, 2025
RoS-Guard: Robust and Scalable Online Change Detection with Delay-Optimal GuaranteesZelin Zhu, Yancheng Huang, Kai Yang
Online change detection (OCD) aims to rapidly identify change points in streaming data and is critical in applications such as power system monitoring, wireless network sensing, and financial anomaly detection. Existing OCD methods typically assume precise system knowledge, which is unrealistic due to estimation errors and environmental variations. Moreover, existing OCD methods often struggle with efficiency in large-scale systems. To overcome these challenges, we propose RoS-Guard, a robust and optimal OCD algorithm tailored for linear systems with uncertainty. Through a tight relaxation and reformulation of the OCD optimization problem, RoS-Guard employs neural unrolling to enable efficient parallel computation via GPU acceleration. The algorithm provides theoretical guarantees on performance, including expected false alarm rate and worst-case average detection delay. Extensive experiments validate the effectiveness of RoS-Guard and demonstrate significant computational speedup in large-scale system scenarios.
LGOct 8, 2025
LLM Unlearning Under the Microscope: A Full-Stack View on Methods and MetricsChongyu Fan, Changsheng Wang, Yancheng Huang et al.
Machine unlearning for large language models (LLMs) aims to remove undesired data, knowledge, and behaviors (e.g., for safety, privacy, or copyright) while preserving useful model capabilities. Despite rapid progress over the past two years, research in LLM unlearning remains fragmented, with limited clarity on what constitutes effective unlearning and how it should be rigorously evaluated. In this work, we present a principled taxonomy of twelve recent stateful unlearning methods, grouped into three methodological families: divergence-driven optimization, representation misalignment, and rejection-based targeted unlearning. Building on this taxonomy, we revisit the evaluation of unlearning effectiveness (UE), utility retention (UT), and robustness (Rob), focusing on the WMDP benchmark. Our analysis shows that current evaluations, dominated by multiple-choice question (MCQ) accuracy, offer only a narrow perspective, often overstating success while overlooking the model's actual generation behavior. To address this gap, we introduce open question-answering (Open-QA) metrics that better capture generative performance and reveal the inherent UE-UT tradeoff across method families. Furthermore, we demonstrate that robustness requires finer-grained analysis: for example, vulnerabilities differ substantially between in-domain relearning and out-of-domain fine-tuning, even though both fall under model-level attacks. Through this study, we hope to deliver a full-stack revisit of LLM unlearning and actionable guidance for designing and evaluating future methods.