A simple connection from loss flatness to compressed neural representations
This provides a principled resolution to contradictory findings on sharpness and generalization, which is foundational for understanding optimization in machine learning.
The paper tackled the unclear significance of loss sharpness by linking it to neural representation compression, showing that flatter minima mathematically limit compression and empirically validating consistent correlations across architectures.
Despite extensive study, the significance of sharpness -- the trace of the loss Hessian at local minima -- remains unclear. We investigate an alternative perspective: how sharpness relates to the geometric structure of neural representations, specifically representation compression, defined as how strongly neural activations concentrate under local input perturbations. We introduce three measures -- Local Volumetric Ratio (LVR), Maximum Local Sensitivity (MLS), and Local Dimensionality -- and derive upper bounds showing these are mathematically constrained by sharpness: flatter minima necessarily limit compression. We extend these bounds to reparametrization-invariant sharpness and introduce network-wide variants (NMLS, NVR) that provide tighter, more stable bounds than prior single-layer analyses. Empirically, we validate consistent positive correlations across feedforward, convolutional, and transformer architectures. Our results suggest that sharpness fundamentally quantifies representation compression, offering a principled resolution to contradictory findings on the sharpness-generalization relationship.