LGAIJan 9, 2025

Soup to go: mitigating forgetting during continual learning with model averaging

arXiv:2501.05559v111 citationsh-index: 96
Originality Incremental advance
AI Analysis

This addresses the problem of performance degradation on earlier tasks in continual learning for AI practitioners, offering a computationally efficient solution, though it is incremental as it builds on existing merging techniques.

The paper tackles catastrophic forgetting in continual learning by proposing Sequential Fine-tuning with Averaging (SFA), which merges current models with earlier checkpoints during training, achieving comparable results to state-of-the-art methods without storing past data or multiple parameter copies.

In continual learning, where task data arrives in a sequence, fine-tuning on later tasks will often lead to performance degradation on earlier tasks. This is especially pronounced when these tasks come from diverse domains. In this setting, how can we mitigate catastrophic forgetting of earlier tasks and retain what the model has learned with minimal computational expenses? Inspired by other merging methods, and L2-regression, we propose Sequential Fine-tuning with Averaging (SFA), a method that merges currently training models with earlier checkpoints during the course of training. SOTA approaches typically maintain a data buffer of past tasks or impose a penalty at each gradient step. In contrast, our method achieves comparable results without the need to store past data, or multiple copies of parameters for each gradient step. Furthermore, our method outperforms common merging techniques such as Task Arithmetic, TIES Merging, and WiSE-FT, as well as other penalty methods like L2 and Elastic Weight Consolidation. In turn, our method offers insight into the benefits of merging partially-trained models during training across both image and language domains.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes