MLLGMay 18, 2022

Maslow's Hammer for Catastrophic Forgetting: Node Re-Use vs Node Activation

arXiv:2205.09029v121 citationsh-index: 43
Originality Incremental advance
AI Analysis

This work addresses a key challenge in continual learning for AI systems, providing theoretical insights that are incremental but clarify when existing methods work best.

The paper tackles the problem of catastrophic forgetting in continual learning, where neural networks lose performance on old tasks when learning new ones, and finds that forgetting is worst at intermediate task similarity due to a trade-off between node activation and node re-use.

Continual learning - learning new tasks in sequence while maintaining performance on old tasks - remains particularly challenging for artificial neural networks. Surprisingly, the amount of forgetting does not increase with the dissimilarity between the learned tasks, but appears to be worst in an intermediate similarity regime. In this paper we theoretically analyse both a synthetic teacher-student framework and a real data setup to provide an explanation of this phenomenon that we name Maslow's hammer hypothesis. Our analysis reveals the presence of a trade-off between node activation and node re-use that results in worst forgetting in the intermediate regime. Using this understanding we reinterpret popular algorithmic interventions for catastrophic interference in terms of this trade-off, and identify the regimes in which they are most effective.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes