LGOct 13, 2015

$\ell_1$-regularized Neural Networks are Improperly Learnable in Polynomial Time

arXiv:1510.03528v1103 citations
Originality Highly original
AI Analysis

This provides a theoretical guarantee for polynomial-time learnability of sparse neural networks, addressing a foundational challenge in machine learning theory.

The paper tackles the problem of learning multi-layer neural networks with bounded weight norms, presenting a kernel-based method that achieves a generalization error within ε of the target network with polynomial sample and time complexity, applicable to sigmoid-like and ReLU-like activations.

We study the improper learning of multi-layer neural networks. Suppose that the neural network to be learned has $k$ hidden layers and that the $\ell_1$-norm of the incoming weights of any neuron is bounded by $L$. We present a kernel-based method, such that with probability at least $1 - δ$, it learns a predictor whose generalization error is at most $ε$ worse than that of the neural network. The sample complexity and the time complexity of the presented method are polynomial in the input dimension and in $(1/ε,\log(1/δ),F(k,L))$, where $F(k,L)$ is a function depending on $(k,L)$ and on the activation function, independent of the number of neurons. The algorithm applies to both sigmoid-like activation functions and ReLU-like activation functions. It implies that any sufficiently sparse neural network is learnable in polynomial time.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes