LGAIMLOct 26, 2024

Emergence of Globally Attracting Fixed Points in Deep Neural Networks With Nonlinear Activations

arXiv:2410.20107v21 citationsh-index: 6AISTATS
AI Analysis

This provides theoretical insights into the implicit biases of deep neural networks, addressing a fundamental issue for researchers in machine learning theory.

The authors tackled the problem of understanding how similarity between hidden representations evolves across layers in deep neural networks, showing that for nonlinear activations, the kernel sequence converges globally to a unique fixed point, which can be orthogonal or similar depending on activation and architecture.

Understanding how neural networks transform input data across layers is fundamental to unraveling their learning and generalization capabilities. Although prior work has used insights from kernel methods to study neural networks, a global analysis of how the similarity between hidden representations evolves across layers remains underexplored. In this paper, we introduce a theoretical framework for the evolution of the kernel sequence, which measures the similarity between the hidden representation for two different inputs. Operating under the mean-field regime, we show that the kernel sequence evolves deterministically via a kernel map, which only depends on the activation function. By expanding activation using Hermite polynomials and using their algebraic properties, we derive an explicit form for kernel map and fully characterize its fixed points. Our analysis reveals that for nonlinear activations, the kernel sequence converges globally to a unique fixed point, which can correspond to orthogonal or similar representations depending on the activation and network architecture. We further extend our results to networks with residual connections and normalization layers, demonstrating similar convergence behaviors. This work provides new insights into the implicit biases of deep neural networks and how architectural choices influence the evolution of representations across layers.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes