LGAIFeb 11, 2025

Forget Forgetting: Continual Learning in a World of Abundant Memory

arXiv:2502.07274v46 citationsh-index: 30
Originality Highly original
AI Analysis

This work challenges traditional assumptions in continual learning, offering a scalable solution for real-world systems where GPU time, not storage, is the bottleneck.

The paper tackles the problem of continual learning in systems with abundant memory, finding that the core challenge shifts from stability to plasticity, and proposes Weight Space Consolidation to address this trade-off, outperforming baselines while matching low computational costs.

Continual learning (CL) has traditionally focused on minimizing exemplar memory, a constraint often misaligned with modern systems where GPU time, not storage, is the primary bottleneck. This paper challenges this paradigm by investigating a more realistic regime: one where memory is abundant enough to mitigate forgetting, but full retraining from scratch remains prohibitively expensive. In this practical "middle ground", we find that the core challenge shifts from stability to plasticity, as models become biased toward prior tasks and struggle to learn new ones. Conversely, improved stability allows simple replay baselines to outperform the state-of-the-art methods at a fraction of the GPU cost. To address this newly surfaced trade-off, we propose Weight Space Consolidation, a lightweight method that combines (1) rank-based parameter resets to restore plasticity with (2) weight averaging to enhance stability. Validated on both class-incremental learning with image classifiers and continual instruction tuning with large language models, our approach outperforms strong baselines while matching the low computational cost of replay, offering a scalable alternative to expensive full-retraining. These findings challenge long-standing CL assumptions and establish a new, cost-efficient baseline for real-world CL systems where exemplar memory is no longer the limiting factor.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes