LGAIApr 15

From Order to Distribution: A Spectral Characterization of Forgetting in Continual Learning

arXiv:2604.1346051.2h-index: 2
Predicted impact top 49% in LG · last 90 daysOriginality Incremental advance
AI Analysis

For continual learning theorists, this work offers a rigorous distribution-based analysis of forgetting, moving beyond previous order-based analyses, though it is limited to an exact-fit linear regression setting.

This paper provides a theoretical characterization of forgetting in continual learning by shifting from random task orderings to i.i.d. task sampling from a distribution. The authors derive an exact operator identity for forgetting, establish an unconditional upper bound, identify the leading asymptotic term, and characterize the convergence rate in terms of geometric properties of the task distribution.

A central challenge in continual learning is forgetting, the loss of performance on previously learned tasks induced by sequential adaptation to new ones. While forgetting has been extensively studied empirically, rigorous theoretical characterizations remain limited. A notable step in this direction is \citet{evron2022catastrophic}, which analyzes forgetting under random orderings of a fixed task collection in overparameterized linear regression. We shift the perspective from order to distribution. Rather than asking how a fixed task collection behaves under random orderings, we study an exact-fit linear regime in which tasks are sampled i.i.d.\ from a task distribution~$Π$, and ask how the generating distribution itself governs forgetting. In this setting, we derive an exact operator identity for the forgetting quantity, revealing a recursive spectral structure. Building on this identity, we establish an unconditional upper bound, identify the leading asymptotic term, and, in generic nondegenerate cases, characterize the convergence rate up to constants. We further relate this rate to geometric properties of the task distribution, clarifying what drives slow or fast forgetting in this model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes