LGNAMay 19, 2022

How catastrophic can catastrophic forgetting be in linear regression?

arXiv:2205.09588v285 citationsh-index: 44
AI Analysis

This provides theoretical insights into catastrophic forgetting for researchers in continual learning, though it is incremental as it focuses on a specific linear setting.

The paper tackles catastrophic forgetting in continual learning by analyzing an overparameterized linear model trained on sequential tasks, proving an upper bound of T^2 * min{1/sqrt(k), d/k} on forgetting when tasks are presented cyclically, and showing this factor can be removed with random ordering.

To better understand catastrophic forgetting, we study fitting an overparameterized linear model to a sequence of tasks with different input distributions. We analyze how much the model forgets the true labels of earlier tasks after training on subsequent tasks, obtaining exact expressions and bounds. We establish connections between continual learning in the linear setting and two other research areas: alternating projections and the Kaczmarz method. In specific settings, we highlight differences between forgetting and convergence to the offline solution as studied in those areas. In particular, when T tasks in d dimensions are presented cyclically for k iterations, we prove an upper bound of T^2 * min{1/sqrt(k), d/k} on the forgetting. This stands in contrast to the convergence to the offline solution, which can be arbitrarily slow according to existing alternating projection results. We further show that the T^2 factor can be lifted when tasks are presented in a random ordering.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes