LGCVApr 24, 2025

The effects of Hessian eigenvalue spectral density type on the applicability of Hessian analysis to generalization capability assessment of neural networks

arXiv:2504.17618v10.021 citationsh-index: 2
AI Analysis45

This work addresses the reliability of Hessian-based generalization assessment for neural networks, identifying limitations in cases of manipulated gradients, which is incremental but important for researchers in optimization and generalization theory.

The paper investigates how different types of Hessian eigenvalue spectral density (HESD) affect the applicability of Hessian analysis for assessing neural network generalization, finding that mainly positive HESD occurs in standard training while mainly negative HESD results from external gradient manipulation, and proposes criteria to determine HESD type and estimate generalization potential.

Hessians of neural network (NN) contain essential information about the curvature of NN loss landscapes which can be used to estimate NN generalization capabilities. We have previously proposed generalization criteria that rely on the observation that Hessian eigenvalue spectral density (HESD) behaves similarly for a wide class of NNs. This paper further studies their applicability by investigating factors that can result in different types of HESD. We conduct a wide range of experiments showing that HESD mainly has positive eigenvalues (MP-HESD) for NN training and fine-tuning with various optimizers on different datasets with different preprocessing and augmentation procedures. We also show that mainly negative HESD (MN-HESD) is a consequence of external gradient manipulation, indicating that the previously proposed Hessian analysis methodology cannot be applied in such cases. We also propose criteria and corresponding conditions to determine HESD type and estimate NN generalization potential. These HESD types and previously proposed generalization criteria are combined into a unified HESD analysis methodology. Finally, we discuss how HESD changes during training, and show the occurrence of quasi-singular (QS) HESD and its influence on the proposed methodology and on the conventional assumptions about the relation between Hessian eigenvalues and NN loss landscape curvature.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes