LGAIMLFeb 11, 2025

Analysis of Overparameterization in Continual Learning under a Linear Model

arXiv:2502.10442v13 citationsh-index: 2
Originality Synthesis-oriented
AI Analysis

This provides a theoretical foundation for understanding forgetting in continual learning, but it is incremental as it focuses on a simplified linear model.

The paper tackles catastrophic forgetting in continual learning by analyzing a linear regression model under gradient descent, showing that overparameterization alone can mitigate forgetting. They prove that as overparameterization increases, the model achieves low risk on the first task after sequential training on two tasks.

Autonomous machine learning systems that learn many tasks in sequence are prone to the catastrophic forgetting problem. Mathematical theory is needed in order to understand the extent of forgetting during continual learning. As a foundational step towards this goal, we study continual learning and catastrophic forgetting from a theoretical perspective in the simple setting of gradient descent with no explicit algorithmic mechanism to prevent forgetting. In this setting, we analytically demonstrate that overparameterization alone can mitigate forgetting in the context of a linear regression model. We consider a two-task setting motivated by permutation tasks, and show that as the overparameterization ratio becomes sufficiently high, a model trained on both tasks in sequence results in a low-risk estimator for the first task. As part of this work, we establish a non-asymptotic bound of the risk of a single linear regression task, which may be of independent interest to the field of double descent theory.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes