LGDec 29, 2017

The Multilinear Structure of ReLU Networks

arXiv:1712.10132v254 citations
Originality Synthesis-oriented
AI Analysis

This addresses a foundational issue in machine learning theory for researchers studying optimization in neural networks, though it is incremental in applying existing mathematical frameworks.

The paper tackled the problem of understanding the loss surface of ReLU neural networks with hinge loss, revealing that all local minima are non-differentiable except in flat regions, necessitating nonsmooth analysis techniques.

We study the loss surface of neural networks equipped with a hinge loss criterion and ReLU or leaky ReLU nonlinearities. Any such network defines a piecewise multilinear form in parameter space. By appealing to harmonic analysis we show that all local minima of such network are non-differentiable, except for those minima that occur in a region of parameter space where the loss surface is perfectly flat. Non-differentiable minima are therefore not technicalities or pathologies; they are heart of the problem when investigating the loss of ReLU networks. As a consequence, we must employ techniques from nonsmooth analysis to study these loss surfaces. We show how to apply these techniques in some illustrative cases.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes