Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture
This work addresses the fundamental problem of neural network interpretability and generalization for researchers in machine learning, though it appears incremental as it builds on existing geometric analyses.
The paper tackles the problem of understanding how neural network architectures influence generalization by proposing the geometric invariance hypothesis (GIH), which states that input space curvature remains invariant in certain directions during training, and shows that ResNets fail to generalize in specific orientations unlike MLPs, with experimental results linking GIH to generalization.
In this paper, we propose the $\textit{geometric invariance hypothesis (GIH)}$, which argues that the input space curvature of a neural network remains invariant under transformation in certain architecture-dependent directions during training. We investigate a simple, non-linear binary classification problem residing on a plane in a high dimensional space and observe that$\unicode{x2014}$unlike MLPs$\unicode{x2014}$ResNets fail to generalize depending on the orientation of the plane. Motivated by this example, we define a neural network's $\textbf{average geometry}$ and $\textbf{average geometry evolution}$ as compact $\textit{architecture-dependent}$ summaries of the model's input-output geometry and its evolution during training. By investigating the average geometry evolution at initialization, we discover that the geometry of a neural network evolves according to the data covariance projected onto its average geometry. This means that the geometry only changes in a subset of the input space when the average geometry is low-rank, such as in ResNets. This causes an architecture-dependent invariance property in the input space curvature, which we dub GIH. Finally, we present extensive experimental results to observe the consequences of GIH and how it relates to generalization in neural networks.