LGCVJun 26, 2023

Parameter-Level Soft-Masking for Continual Learning

arXiv:2306.14775v164 citationsh-index: 87
Originality Incremental advance
AI Analysis

This work addresses key bottlenecks in continual learning for AI systems, offering a novel approach to improve efficiency and performance, though it is incremental in the context of existing masking methods.

The paper tackles the problem of catastrophic forgetting, limited knowledge transfer, and network capacity over-consumption in continual learning by proposing SPG, a parameter-level soft-masking technique that achieves all three objectives, enabling significant knowledge transfer even among dissimilar tasks while mitigating forgetting.

Existing research on task incremental learning in continual learning has primarily focused on preventing catastrophic forgetting (CF). Although several techniques have achieved learning with no CF, they attain it by letting each task monopolize a sub-network in a shared network, which seriously limits knowledge transfer (KT) and causes over-consumption of the network capacity, i.e., as more tasks are learned, the performance deteriorates. The goal of this paper is threefold: (1) overcoming CF, (2) encouraging KT, and (3) tackling the capacity problem. A novel technique (called SPG) is proposed that soft-masks (partially blocks) parameter updating in training based on the importance of each parameter to old tasks. Each task still uses the full network, i.e., no monopoly of any part of the network by any task, which enables maximum KT and reduction in capacity usage. To our knowledge, this is the first work that soft-masks a model at the parameter-level for continual learning. Extensive experiments demonstrate the effectiveness of SPG in achieving all three objectives. More notably, it attains significant transfer of knowledge not only among similar tasks (with shared knowledge) but also among dissimilar tasks (with little shared knowledge) while mitigating CF.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes