LGApr 13

Inter-Layer Hessian Analysis of Neural Networks with DAG Architectures

arXiv:2604.1163935.5h-index: 10
Predicted impact top 68% in LG · last 90 daysOriginality Incremental advance
AI Analysis

Provides a theoretical framework for understanding Hessian structure in arbitrary DAG architectures, enabling new diagnostics for optimization and generalization, but remains analytical without demonstrating practical improvements.

The paper decomposes the neural network Hessian into inter-layer blocks using a DAG-based formalism, separating Gauss-Newton and tensor components. It introduces stochastic diagnostic metrics (e.g., resonance, GN-Gap) that reveal curvature interactions, validated on MLPs and ResNet-18.

Modern automatic differentiation frameworks (JAX, PyTorch) return the Hessian of the loss function as a monolithic tensor, without exposing the internal structure of inter-layer interactions. This paper presents an analytical formalism that explicitly decomposes the full Hessian into blocks indexed by the DAG of an arbitrary architecture. The canonical decomposition $H = H^{GN} + H^T$ separates the Gauss--Newton component (convex part) from the tensor component (residual curvature responsible for saddle points). For piecewise-linear activations (ReLU), the tensor component of the input Hessian vanishes ($H^{T}_{v,w}\!\equiv\!0$ a.e., $H^f_{v,w}\!=\!H^{GN}_{v,w}\!\succeq\!0$); the full parametric Hessian contains residual terms that do not reduce to the GGN. Building on this decomposition, we introduce diagnostic metrics (inter-layer resonance~$\mathcal{R}$, geometric coupling~$\mathcal{C}$, stable rank~$\mathcal{D}$, GN-Gap) that are estimated stochastically in $O(P)$ time and reveal structural curvature interactions between layers. The theoretical analysis explains exponential decay of resonance in vanilla networks and its preservation under skip connections; empirical validation spans fully connected MLPs (Exp.\,1--5) and convolutional architectures (ResNet-18, ${\sim}11$M~parameters, Exp.\,6). When the architecture reduces to a single node, all definitions collapse to the standard Hessian $\nabla^2_θ\mathcal{L}(θ)\in\mathbb{R}^{p\times p}$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes