Is Memorization Helpful or Harmful? Prior Information Sets the Threshold
This work addresses the fundamental problem of understanding when memorization is beneficial or detrimental for generalization in machine learning, providing theoretical insights for researchers in statistical learning.
The paper investigates the relationship between training error and generalization error in overparameterized linear models with Bayesian priors, finding that optimal generalization requires either near-interpolation (memorization) or noise-level training error depending on thresholds set by prior information and noise parameters.
We examine the connection between training error and generalization error for arbitrary estimating procedures, working in an overparameterized linear model under general priors in a Bayesian setup. We find determining factors inherent to the prior distribution $π$, giving explicit conditions under which optimal generalization necessitates that the training error be (i) near interpolating relative to the noise size (i.e., memorization is necessary), or (ii) close to the noise level (i.e., overfitting is harmful). Remarkably, these phenomena occur when the noise reaches thresholds determined by the Fisher information and the variance parameters of the prior $π$.