OC LGFeb 10, 2020

Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions

Jingzhao Zhang, Hongzhou Lin, Stefanie Jegelka, Ali Jadbabaie, Suvrit Sra

arXiv:2002.04130v321.847 citations

Originality Highly original

AI Analysis

This addresses a foundational challenge in optimization for machine learning, particularly for training neural networks with non-differentiable activations like ReLU, though it is incremental in refining existing methods.

The paper tackles the problem of finding stationary points for nonsmooth nonconvex functions, showing that exact solutions are impossible in finite time and introducing a relaxed notion of stationarity with randomized first-order methods that achieve min-max optimal complexity and perform well in training ReLU neural networks.

We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonconvex functions. In particular, we study the class of Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions for which the chain rule of calculus holds. This class contains examples such as ReLU neural networks and others with non-differentiable activation functions. We first show that finding an $ε$-stationary point with first-order methods is impossible in finite time. We then introduce the notion of $(δ, ε)$-stationarity, which allows for an $ε$-approximate gradient to be the convex combination of generalized gradients evaluated at points within distance $δ$ to the solution. We propose a series of randomized first-order methods and analyze their complexity of finding a $(δ, ε)$-stationary point. Furthermore, we provide a lower bound and show that our stochastic algorithm has min-max optimal dependence on $δ$. Empirically, our methods perform well for training ReLU neural networks.

View on arXiv PDF

Similar