ML LG OCJun 2, 2020

Sparse Perturbations for Improved Convergence in Stochastic Zeroth-Order Optimization

Mayumi Ohta, Nathaniel Berger, Artem Sokolov, Stefan Riezler

arXiv:2006.01759v29.610 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses a key bottleneck in black-box optimization for applications like adversarial attacks, though it appears incremental as it modifies existing SZO methods rather than introducing a new paradigm.

The paper tackles the problem of slow convergence in stochastic zeroth-order optimization methods due to dimensionality dependency by introducing a sparse perturbation method that reduces this dependency to the expected perturbation dimensionality. Experimental results on MNIST and CIFAR neural networks show faster convergence in training loss and test accuracy, with a smaller gradient approximation error compared to dense methods.

Interest in stochastic zeroth-order (SZO) methods has recently been revived in black-box optimization scenarios such as adversarial black-box attacks to deep neural networks. SZO methods only require the ability to evaluate the objective function at random input points, however, their weakness is the dependency of their convergence speed on the dimensionality of the function to be evaluated. We present a sparse SZO optimization method that reduces this factor to the expected dimensionality of the random perturbation during learning. We give a proof that justifies this reduction for sparse SZO optimization for non-convex functions without making any assumptions on sparsity of objective function or gradient. Furthermore, we present experimental results for neural networks on MNIST and CIFAR that show faster convergence in training loss and test accuracy, and a smaller distance of the gradient approximation to the true gradient in sparse SZO compared to dense SZO.

View on arXiv PDF Code

Similar