OCLGFeb 10, 2020

Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions

arXiv:2002.04130v347 citations
AI Analysis

This addresses a foundational challenge in optimization for machine learning, particularly for training neural networks with non-differentiable activations like ReLU, though it is incremental in refining existing methods.

The paper tackles the problem of finding stationary points for nonsmooth nonconvex functions, showing that exact solutions are impossible in finite time and introducing a relaxed notion of stationarity with randomized first-order methods that achieve min-max optimal complexity and perform well in training ReLU neural networks.

We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonconvex functions. In particular, we study the class of Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions for which the chain rule of calculus holds. This class contains examples such as ReLU neural networks and others with non-differentiable activation functions. We first show that finding an $ε$-stationary point with first-order methods is impossible in finite time. We then introduce the notion of $(δ, ε)$-stationarity, which allows for an $ε$-approximate gradient to be the convex combination of generalized gradients evaluated at points within distance $δ$ to the solution. We propose a series of randomized first-order methods and analyze their complexity of finding a $(δ, ε)$-stationary point. Furthermore, we provide a lower bound and show that our stochastic algorithm has min-max optimal dependence on $δ$. Empirically, our methods perform well for training ReLU neural networks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes