Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training
This addresses the problem of forgetting in sequential training for AI researchers, offering insights into over-parameterized networks, but it is incremental as it builds on known issues of catastrophic interference.
The study tackled catastrophic interference in neural networks trained on a fixed, repeated sequence of documents, finding that large language models exhibit anticipatory recovery from forgetting before re-encountering documents, with this behavior strengthening as model size increases.
We explore the training dynamics of neural networks in a structured non-IID setting where documents are presented cyclically in a fixed, repeated sequence. Typically, networks suffer from catastrophic interference when training on a sequence of documents; however, we discover a curious and remarkable property of LLMs finetuned sequentially in this setting: they exhibit anticipatory behavior, recovering from the forgetting on documents before encountering them again. This behavior occurs even though the documents are never presented in context together. The behavior emerges and becomes more robust as the architecture scales up its number of parameters. Through comprehensive experiments and visualizations, we demonstrate a new mechanism by which over-parametrized neural networks can recover from catastrophic interference and uncover new insights into training over-parameterized networks in cyclically structured environments.