OC LG MLApr 7, 2020

Orthant Based Proximal Stochastic Gradient Method for $\ell_1$-Regularized Optimization

Tianyi Chen, Tianyu Ding, Bo Ji, Guanyi Wang, Jing Tian, Yixin Shi, Sheng Yi, Xiao Tu, Zhihui Zhu

arXiv:2004.03639v216.330 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the need for efficient sparsity-inducing methods in machine learning for tasks like feature selection and model compression, representing an incremental improvement over existing stochastic gradient methods.

The paper tackles the problem of l1-regularized optimization for sparsity in machine learning by proposing the Orthant Based Proximal Stochastic Gradient Method (OBProx-SG), which enhances sparsity through orthant face projection and outperforms existing methods in sparsity exploration and objective values on convex problems, achieving higher sparsity without sacrificing accuracy in non-convex deep neural networks like MobileNetV1 and ResNet18.

Sparsity-inducing regularization problems are ubiquitous in machine learning applications, ranging from feature selection to model compression. In this paper, we present a novel stochastic method -- Orthant Based Proximal Stochastic Gradient Method (OBProx-SG) -- to solve perhaps the most popular instance, i.e., the l1-regularized problem. The OBProx-SG method contains two steps: (i) a proximal stochastic gradient step to predict a support cover of the solution; and (ii) an orthant step to aggressively enhance the sparsity level via orthant face projection. Compared to the state-of-the-art methods, e.g., Prox-SG, RDA and Prox-SVRG, the OBProx-SG not only converges to the global optimal solutions (in convex scenario) or the stationary points (in non-convex scenario), but also promotes the sparsity of the solutions substantially. Particularly, on a large number of convex problems, OBProx-SG outperforms the existing methods comprehensively in the aspect of sparsity exploration and objective values. Moreover, the experiments on non-convex deep neural networks, e.g., MobileNetV1 and ResNet18, further demonstrate its superiority by achieving the solutions of much higher sparsity without sacrificing generalization accuracy.

View on arXiv PDF Code

Similar