Double Descent Risk and Volume Saturation Effects: A Geometric Perspective
This addresses a theoretical problem in understanding generalization behavior for researchers in machine learning and statistics, offering incremental insights into model selection criteria.
The paper investigates the double-descent risk phenomenon in machine learning by analyzing the logarithm of model volume from a geometric perspective, finding that it decomposes into components that explain why generalization error does not always increase with model dimensionality, using examples like isotropic linear regression and statistical lattices.
The appearance of the double-descent risk phenomenon has received growing interest in the machine learning and statistics community, as it challenges well-understood notions behind the U-shaped train-test curves. Motivated through Rissanen's minimum description length (MDL), Balasubramanian's Occam's Razor, and Amari's information geometry, we investigate how the logarithm of the model volume: $\log V$, works to extend intuition behind the AIC and BIC model selection criteria. We find that for the particular model classes of isotropic linear regression and statistical lattices, the $\log V$ term may be decomposed into a sum of distinct components, each of which assist in their explanations of the appearance of this phenomenon. In particular they suggest why generalization error does not necessarily continue to grow with increasing model dimensionality.