Approximating Positive Homogeneous Functions with Scale Invariant Neural Networks
This work addresses the theoretical limitations and capabilities of neural networks in solving inverse problems, which is important for researchers in machine learning and signal processing, though it is incremental as it builds on prior work on function approximation.
The paper investigates the ability of ReLU neural networks to solve linear inverse problems by approximating positive homogeneous functions, showing that one-hidden-layer networks cannot recover even 1-sparse vectors, but two-hidden-layer networks can achieve stable approximate recovery with arbitrary precision and sparsity levels, and extends these findings to problems like low-rank matrix recovery and phase retrieval.
We investigate to what extent it is possible to solve linear inverse problems with $ReLu$ networks. Due to the scaling invariance arising from the linearity, an optimal reconstruction function $f$ for such a problem is positive homogeneous, i.e., satisfies $f(λx) = λf(x)$ for all non-negative $λ$. In a $ReLu$ network, this condition translates to considering networks without bias terms. We first consider recovery of sparse vectors from few linear measurements. We prove that $ReLu$- networks with only one hidden layer cannot even recover $1$-sparse vectors, not even approximately, and regardless of the width of the network. However, with two hidden layers, approximate recovery with arbitrary precision and arbitrary sparsity level $s$ is possible in a stable way. We then extend our results to a wider class of recovery problems including low-rank matrix recovery and phase retrieval. Furthermore, we also consider the approximation of general positive homogeneous functions with neural networks. Extending previous work, we establish new results explaining under which conditions such functions can be approximated with neural networks. Our results also shed some light on the seeming contradiction between previous works showing that neural networks for inverse problems typically have very large Lipschitz constants, but still perform very well also for adversarial noise. Namely, the error bounds in our expressivity results include a combination of a small constant term and a term that is linear in the noise level, indicating that robustness issues may occur only for very small noise levels.