Sparsity-depth Tradeoff in Infinitely Wide Deep Neural Networks
This work addresses the trade-off between sparsity and depth in neural networks, offering insights for efficient model design, though it is incremental as it builds on existing NNGP theory.
The paper investigates how sparse neural activity affects generalization in deep Bayesian neural networks at the infinite width limit, finding that sparser networks outperform non-sparse ones at shallow depths across various datasets.
We investigate how sparse neural activity affects the generalization performance of a deep Bayesian neural network at the large width limit. To this end, we derive a neural network Gaussian Process (NNGP) kernel with rectified linear unit (ReLU) activation and a predetermined fraction of active neurons. Using the NNGP kernel, we observe that the sparser networks outperform the non-sparse networks at shallow depths on a variety of datasets. We validate this observation by extending the existing theory on the generalization error of kernel-ridge regression.