Towards a regularity theory for ReLU networks -- chain rule and global error estimates
This work addresses a foundational mathematical gap for researchers in machine learning theory, specifically those studying neural network approximation and applications to PDEs, though it is incremental as it builds on existing approximation theory.
The authors tackled the lack of a rigorous chain rule for derivatives in neural networks with locally Lipschitz activations, such as ReLU, by introducing a derivative that admits a chain rule and a method to convert bounded-domain approximations to global pointwise estimates. This enables the extension of neural network approximation theory to study regularity properties, particularly aiding the understanding of deep learning for high-dimensional PDEs.
Although for neural networks with locally Lipschitz continuous activation functions the classical derivative exists almost everywhere, the standard chain rule is in general not applicable. We will consider a way of introducing a derivative for neural networks that admits a chain rule, which is both rigorous and easy to work with. In addition we will present a method of converting approximation results on bounded domains to global (pointwise) estimates. This can be used to extend known neural network approximation theory to include the study of regularity properties. Of particular interest is the application to neural networks with ReLU activation function, where it contributes to the understanding of the success of deep learning methods for high-dimensional partial differential equations.