LG AIAug 20, 2024

Overcoming Growth-Induced Forgetting in Task-Agnostic Continual Learning

Yuqing Zhao, Jiannong Cao, Divya Saxena, Xiaoyun Liu, Changlin Song, Bo Yuan, Julie McCann

arXiv:2408.10566v54.61 citationsh-index: 3

Originality Incremental advance

AI Analysis

This work addresses a critical issue in continual learning for building scalable lifelong learning systems, though it appears incremental as it builds on existing model growth methods by adding sparsity controls.

The paper tackles the problem of growth-induced forgetting in task-agnostic continual learning, where improper model growth leads to severe knowledge degradation, and proposes SparseGrow to address this by using gradient and parameter sparsity, achieving high adaptability while minimizing forgetting as validated in extensive experiments.

In continual learning (CL), model growth enhances adaptability to new data. However, when model growth is applied improperly, especially in task-agnostic CL, where the entire grown model is used for inference, it can lead to severe degradation of learned knowledge, a problem we term growth-induced forgetting. Most existing methods that adopt model growth to improve adaptability often overlook the forgetting issue, resulting in compromised knowledge retention, making them unsuitable for task-agnostic settings. To promote both adaptability and knowledge retention with model growth, we identify the key: gradient and parameter sparsity. Introducing SparseGrow, which increases gradient sparsity through layer expansion and gradient gating to enable focused updates on parameters while preserving critical parameters, thus inhibiting forgetting. Moreover, it promotes parameter sparsity with sparse initialization and training, aiming at better control of model plasticity, improving adaptability over new data. Extensive experiments across diverse datasets, task-agnostic settings, and a large number of tasks demonstrate the necessity of controlled layer expansion and validate the effectiveness of SparseGrow in achieving high adaptability while minimizing forgetting in continual learning. By enabling model growth with sparsified gradients and parameters, SparseGrow paves the way for building scalable lifelong learning systems capable of continual adaptation with better knowledge retention.

View on arXiv PDF

Similar