FAMar 29, 2023
Optimal approximation using complex-valued neural networksPaul Geuchen, Felix Voigtlaender
Complex-valued neural networks (CVNNs) have recently shown promising empirical success, for instance for increasing the stability of recurrent neural networks and for improving the performance in tasks with complex-valued inputs, such as in MRI fingerprinting. While the overwhelming success of Deep Learning in the real-valued case is supported by a growing mathematical foundation, such a foundation is still largely lacking in the complex-valued case. We thus analyze the expressivity of CVNNs by studying their approximation properties. Our results yield the first quantitative approximation bounds for CVNNs that apply to a wide class of activation functions including the popular modReLU and complex cardioid activation functions. Precisely, our results apply to any activation function that is smooth but not polyharmonic on some non-empty open set; this is the natural generalization of the class of smooth and non-polynomial activation functions to the complex setting. Our main result shows that the error for the approximation of $C^k$-functions scales as $m^{-k/(2n)}$ for $m \to \infty$ where $m$ is the number of neurons, $k$ the smoothness of the target function and $n$ is the (complex) input dimension. Under a natural continuity assumption, we show that this rate is optimal; we further discuss the optimality when dropping this assumption. Moreover, we prove that the problem of approximating $C^k$-functions using continuous approximation methods unavoidably suffers from the curse of dimensionality.
MLNov 2, 2023
Upper and lower bounds for the Lipschitz constant of random neural networksPaul Geuchen, Dominik Stöger, Thomas Telaar et al.
Empirical studies have widely demonstrated that neural networks are highly sensitive to small, adversarial perturbations of the input. The worst-case robustness against these so-called adversarial examples can be quantified by the Lipschitz constant of the neural network. In this paper, we study upper and lower bounds for the Lipschitz constant of random ReLU neural networks. Specifically, we assume that the weights and biases follow a generalization of the He initialization, where general symmetric distributions for the biases are permitted. For deep networks of fixed depth and sufficiently large width, our established upper bound is larger than the lower bound by a factor that is logarithmic in the width. In contrast, for shallow neural networks we characterize the Lipschitz constant up to an absolute numerical constant that is independent of all parameters.
FADec 11, 2024
On best approximation by multivariate ridge functions with applications to generalized translation networksPaul Geuchen, Palina Salanevich, Olov Schavemaker et al.
In this paper, we prove sharp upper and lower bounds for the approximation of Sobolev functions by sums of multivariate ridge functions, i.e., for approximation by functions of the form $\mathbb{R}^d \ni x \mapsto \sum_{k=1}^n \varrho_k(A_k x) \in \mathbb{R}$ with $\varrho_k : \mathbb{R}^\ell \to \mathbb{R}$ and $A_k \in \mathbb{R}^{\ell \times d}$. We show that the order of approximation asymptotically behaves as $n^{-r/(d-\ell)}$, where $r$ is the regularity (order of differentiability) of the Sobolev functions to be approximated. Our lower bound even holds when approximating $L^\infty$-Sobolev functions of regularity $r$ with error measured in $L^1$, while our upper bound applies to the approximation of $L^p$-Sobolev functions in $L^p$ for any $1 \leq p \leq \infty$. These bounds generalize well-known results regarding the approximation properties of univariate ridge functions to the multivariate case. We use our results to obtain sharp asymptotic bounds for the approximation of Sobolev functions using generalized translation networks and complex-valued neural networks.
MLJun 24, 2025
Near-optimal estimates for the $\ell^p$-Lipschitz constants of deep random ReLU neural networksSjoerd Dirksen, Patrick Finke, Paul Geuchen et al.
This paper studies the $\ell^p$-Lipschitz constants of ReLU neural networks $Φ: \mathbb{R}^d \to \mathbb{R}$ with random parameters for $p \in [1,\infty]$. The distribution of the weights follows a variant of the He initialization and the biases are drawn from symmetric distributions. We derive high probability upper and lower bounds for wide networks that differ at most by a factor that is logarithmic in the network's width and linear in its depth. In the special case of shallow networks, we obtain matching bounds. Remarkably, the behavior of the $\ell^p$-Lipschitz constant varies significantly between the regimes $ p \in [1,2) $ and $ p \in [2,\infty] $. For $p \in [2,\infty]$, the $\ell^p$-Lipschitz constant behaves similarly to $\Vert g\Vert_{p'}$, where $g \in \mathbb{R}^d$ is a $d$-dimensional standard Gaussian vector and $1/p + 1/p' = 1$. In contrast, for $p \in [1,2)$, the $\ell^p$-Lipschitz constant aligns more closely to $\Vert g \Vert_{2}$.
FAMay 26, 2023
Universal approximation with complex-valued deep narrow neural networksPaul Geuchen, Thomas Jahn, Hannes Matt
We study the universality of complex-valued neural networks with bounded widths and arbitrary depths. Under mild assumptions, we give a full description of those activation functions $\varrho:\mathbb{C}\to \mathbb{C}$ that have the property that their associated networks are universal, i.e., are capable of approximating continuous functions to arbitrary accuracy on compact domains. Precisely, we show that deep narrow complex-valued networks are universal if and only if their activation function is neither holomorphic, nor antiholomorphic, nor $\mathbb{R}$-affine. This is a much larger class of functions than in the dual setting of arbitrary width and fixed depth. Unlike in the real case, the sufficient width differs significantly depending on the considered activation function. We show that a width of $2n+2m+5$ is always sufficient and that in general a width of $max\{2n,2m\}$ is necessary. We prove, however, that a width of $n+m+3$ suffices for a rich subclass of the admissible activation functions. Here, $n$ and $m$ denote the input and output dimensions of the considered networks. Moreover, for the case of smooth and non-polyharmonic activation functions, we provide a quantitative approximation bound in terms of the depth of the considered networks.