LGDSMLJul 21, 2021

Efficient Algorithms for Learning Depth-2 Neural Networks with General ReLU Activations

arXiv:2107.10209v224 citations
Originality Incremental advance
AI Analysis

This addresses a key limitation in neural network theory for researchers, as prior methods assumed zero bias, making it an incremental but important extension.

The paper tackles the problem of learning depth-2 neural networks with general ReLU activations, including bias terms, by developing polynomial-time algorithms that achieve sample efficiency under mild assumptions, establishing identifiability of parameters.

We present polynomial time and sample efficient algorithms for learning an unknown depth-2 feedforward neural network with general ReLU activations, under mild non-degeneracy assumptions. In particular, we consider learning an unknown network of the form $f(x) = {a}^{\mathsf{T}}σ({W}^\mathsf{T}x+b)$, where $x$ is drawn from the Gaussian distribution, and $σ(t) := \max(t,0)$ is the ReLU activation. Prior works for learning networks with ReLU activations assume that the bias $b$ is zero. In order to deal with the presence of the bias terms, our proposed algorithm consists of robustly decomposing multiple higher order tensors arising from the Hermite expansion of the function $f(x)$. Using these ideas we also establish identifiability of the network parameters under minimal assumptions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes