ML DIS-NN IT LGDec 27, 2024

Deep ReLU networks -- injectivity capacity upper bounds

arXiv:2412.19677v13 citationsh-index: 22

Originality Highly original

AI Analysis

This provides foundational mathematical insights into the injectivity properties of deep neural networks, addressing a previously untouchable problem in theoretical ML.

The paper tackles the problem of determining the minimal output-to-input ratio for deep ReLU networks to ensure unique input recovery, and finds that only 4 layers are sufficient to closely approach no expansion needed, resembling practical observations.

We study deep ReLU feed forward neural networks (NN) and their injectivity abilities. The main focus is on \emph{precisely} determining the so-called injectivity capacity. For any given hidden layers architecture, it is defined as the minimal ratio between number of network's outputs and inputs which ensures unique recoverability of the input from a realizable output. A strong recent progress in precisely studying single ReLU layer injectivity properties is here moved to a deep network level. In particular, we develop a program that connects deep $l$-layer net injectivity to an $l$-extension of the $\ell_0$ spherical perceptrons, thereby massively generalizing an isomorphism between studying single layer injectivity and the capacity of the so-called (1-extension) $\ell_0$ spherical perceptrons discussed in [82]. \emph{Random duality theory} (RDT) based machinery is then created and utilized to statistically handle properties of the extended $\ell_0$ spherical perceptrons and implicitly of the deep ReLU NNs. A sizeable set of numerical evaluations is conducted as well to put the entire RDT machinery in practical use. From these we observe a rapidly decreasing tendency in needed layers' expansions, i.e., we observe a rapid \emph{expansion saturation effect}. Only $4$ layers of depth are sufficient to closely approach level of no needed expansion -- a result that fairly closely resembles observations made in practical experiments and that has so far remained completely untouchable by any of the existing mathematical methodologies.

View on arXiv PDF

Similar