OCLGMLApr 7, 2020

Orthant Based Proximal Stochastic Gradient Method for $\ell_1$-Regularized Optimization

arXiv:2004.03639v230 citations
AI Analysis

This addresses the need for efficient sparsity-inducing methods in machine learning for tasks like feature selection and model compression, representing an incremental improvement over existing stochastic gradient methods.

The paper tackles the problem of l1-regularized optimization for sparsity in machine learning by proposing the Orthant Based Proximal Stochastic Gradient Method (OBProx-SG), which enhances sparsity through orthant face projection and outperforms existing methods in sparsity exploration and objective values on convex problems, achieving higher sparsity without sacrificing accuracy in non-convex deep neural networks like MobileNetV1 and ResNet18.

Sparsity-inducing regularization problems are ubiquitous in machine learning applications, ranging from feature selection to model compression. In this paper, we present a novel stochastic method -- Orthant Based Proximal Stochastic Gradient Method (OBProx-SG) -- to solve perhaps the most popular instance, i.e., the l1-regularized problem. The OBProx-SG method contains two steps: (i) a proximal stochastic gradient step to predict a support cover of the solution; and (ii) an orthant step to aggressively enhance the sparsity level via orthant face projection. Compared to the state-of-the-art methods, e.g., Prox-SG, RDA and Prox-SVRG, the OBProx-SG not only converges to the global optimal solutions (in convex scenario) or the stationary points (in non-convex scenario), but also promotes the sparsity of the solutions substantially. Particularly, on a large number of convex problems, OBProx-SG outperforms the existing methods comprehensively in the aspect of sparsity exploration and objective values. Moreover, the experiments on non-convex deep neural networks, e.g., MobileNetV1 and ResNet18, further demonstrate its superiority by achieving the solutions of much higher sparsity without sacrificing generalization accuracy.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes