ML LGJun 27, 2025

Beyond ReLU: How Activations Affect Neural Kernels and Random Wide Networks

arXiv:2506.22429v110.35 citationsh-index: 13

Originality Incremental advance

AI Analysis

This work addresses a theoretical gap for researchers in deep learning theory by extending kernel analysis to common activation functions, though it is incremental as it builds on existing NTK/NNGP frameworks.

The paper tackles the limited theoretical understanding of neural tangent kernels (NTK) and neural network Gaussian process kernels (NNGP) beyond ReLU activations, providing a general characterization for typical activation functions like SELU, ELU, or LeakyReLU, and showing that a broad class of non-infinitely smooth activations generate equivalent RKHSs across depths while polynomial ones do not.

While the theory of deep learning has made some progress in recent years, much of it is limited to the ReLU activation function. In particular, while the neural tangent kernel (NTK) and neural network Gaussian process kernel (NNGP) have given theoreticians tractable limiting cases of fully connected neural networks, their properties for most activation functions except for powers of the ReLU function are poorly understood. Our main contribution is to provide a more general characterization of the RKHS of these kernels for typical activation functions whose only non-smoothness is at zero, such as SELU, ELU, or LeakyReLU. Our analysis also covers a broad set of special cases such as missing biases, two-layer networks, or polynomial activations. Our results show that a broad class of not infinitely smooth activations generate equivalent RKHSs at different network depths, while polynomial activations generate non-equivalent RKHSs. Finally, we derive results for the smoothness of NNGP sample paths, characterizing the smoothness of infinitely wide neural networks at initialization.

View on arXiv PDF

Similar