$\ell_1$-regularized Neural Networks are Improperly Learnable in Polynomial Time
This provides a theoretical guarantee for polynomial-time learnability of sparse neural networks, addressing a foundational challenge in machine learning theory.
The paper tackles the problem of learning multi-layer neural networks with bounded weight norms, presenting a kernel-based method that achieves a generalization error within ε of the target network with polynomial sample and time complexity, applicable to sigmoid-like and ReLU-like activations.
We study the improper learning of multi-layer neural networks. Suppose that the neural network to be learned has $k$ hidden layers and that the $\ell_1$-norm of the incoming weights of any neuron is bounded by $L$. We present a kernel-based method, such that with probability at least $1 - δ$, it learns a predictor whose generalization error is at most $ε$ worse than that of the neural network. The sample complexity and the time complexity of the presented method are polynomial in the input dimension and in $(1/ε,\log(1/δ),F(k,L))$, where $F(k,L)$ is a function depending on $(k,L)$ and on the activation function, independent of the number of neurons. The algorithm applies to both sigmoid-like activation functions and ReLU-like activation functions. It implies that any sufficiently sparse neural network is learnable in polynomial time.