On the Optimal Memorization Power of ReLU Neural Networks
This work addresses a foundational problem in understanding the capacity and efficiency of neural networks for memorization, with implications for theoretical machine learning.
The paper tackles the problem of memorizing N data points with ReLU neural networks, showing they can achieve this with approximately sqrt(N) parameters, which is optimal up to logarithmic factors, and also provides optimal bounds for networks with bounded depth.
We study the memorization power of feedforward ReLU neural networks. We show that such networks can memorize any $N$ points that satisfy a mild separability assumption using $\tilde{O}\left(\sqrt{N}\right)$ parameters. Known VC-dimension upper bounds imply that memorizing $N$ samples requires $Ω(\sqrt{N})$ parameters, and hence our construction is optimal up to logarithmic factors. We also give a generalized construction for networks with depth bounded by $1 \leq L \leq \sqrt{N}$, for memorizing $N$ samples using $\tilde{O}(N/L)$ parameters. This bound is also optimal up to logarithmic factors. Our construction uses weights with large bit complexity. We prove that having such a large bit complexity is both necessary and sufficient for memorization with a sub-linear number of parameters.