LGMLJul 2, 2019

Best k-layer neural network approximations

arXiv:1907.01507v26 citations
Originality Incremental advance
AI Analysis

This addresses a foundational issue in machine learning optimization, revealing limitations in neural network training that could affect all practitioners, but it is incremental as it builds on known geometric analogies.

The paper tackles the problem of empirical risk minimization (ERM) for neural networks, showing that it generally has no solution, even for two-layer networks with common activations like ReLU, hyperbolic tangent, and sigmoid, and provides a geometric explanation for this failure.

We show that the empirical risk minimization (ERM) problem for neural networks has no solution in general. Given a training set $s_1, \dots, s_n \in \mathbb{R}^p$ with corresponding responses $t_1,\dots,t_n \in \mathbb{R}^q$, fitting a $k$-layer neural network $ν_θ: \mathbb{R}^p \to \mathbb{R}^q$ involves estimation of the weights $θ\in \mathbb{R}^m$ via an ERM: \[ \inf_{θ\in \mathbb{R}^m} \; \sum_{i=1}^n \lVert t_i - ν_θ(s_i) \rVert_2^2. \] We show that even for $k = 2$, this infimum is not attainable in general for common activations like ReLU, hyperbolic tangent, and sigmoid functions. A high-level explanation is like that for the nonexistence of best rank-$r$ approximations of higher-order tensors --- the set of parameters is not a closed set --- but the geometry involved for best $k$-layer neural networks approximations is more subtle. In addition, we show that for smooth activations $σ(x)= 1/\bigl(1 + \exp(-x)\bigr)$ and $σ(x)=\tanh(x)$, such failure to attain an infimum can happen on a positive-measured subset of responses. For the ReLU activation $σ(x)=\max(0,x)$, we completely classifying cases where the ERM for a best two-layer neural network approximation attains its infimum. As an aside, we obtain a precise description of the geometry of the space of two-layer neural networks with $d$ neurons in the hidden layer: it is the join locus of a line and the $d$-secant locus of a cone.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes