Injectivity capacity of ReLU gates
This work addresses the theoretical understanding of ReLU network properties, which is important for researchers and practitioners in deep learning.
This paper investigates the injectivity capacity of ReLU network layers by establishing an isomorphism to the capacity of the $\ell_0$ spherical perceptron. Using fully lifted random duality theory (fl RDT), the authors developed a program that converges remarkably fast, with relative corrections not exceeding ~0.1% by the third level of lifting.
We consider the injectivity property of the ReLU networks layers. Determining the ReLU injectivity capacity (ratio of the number of layer's inputs and outputs) is established as isomorphic to determining the capacity of the so-called $\ell_0$ spherical perceptron. Employing \emph{fully lifted random duality theory} (fl RDT) a powerful program is developed and utilized to handle the $\ell_0$ spherical perceptron and implicitly the ReLU layers injectivity. To put the entire fl RDT machinery in practical use, a sizeable set of numerical evaluations is conducted as well. The lifting mechanism is observed to converge remarkably fast with relative corrections in the estimated quantities not exceeding $\sim 0.1\%$ already on the third level of lifting. Closed form explicit analytical relations among key lifting parameters are uncovered as well. In addition to being of incredible importance in handling all the required numerical work, these relations also shed a new light on beautiful parametric interconnections within the lifting structure. Finally, the obtained results are also shown to fairly closely match the replica predictions from [40].