NEAILGApr 29, 2024

CLASSP: a Biologically-Inspired Approach to Continual Learning through Adjustment Suppression and Sparsity Promotion

arXiv:2405.09637v22 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses catastrophic forgetting for AI systems that need to learn sequentially, representing an incremental improvement over existing methods like EWC.

The paper tackles the problem of catastrophic forgetting in continual learning by introducing CLASSP, a biologically-inspired method based on adjustment suppression and sparsity promotion, which outperforms Elastic Weight Consolidation in accuracy and memory footprint on computer vision and sentiment analysis datasets.

This paper introduces a new biologically-inspired training method named Continual Learning through Adjustment Suppression and Sparsity Promotion (CLASSP). CLASSP is based on two main principles observed in neuroscience, particularly in the context of synaptic transmission and Long-Term Potentiation (LTP). The first principle is a decay rate over the weight adjustment, which is implemented as a generalization of the AdaGrad optimization algorithm. This means that weights that have received many updates should have lower learning rates as they likely encode important information about previously seen data. However, this principle results in a diffuse distribution of updates throughout the model, as it promotes updates for weights that haven't been previously updated, while a sparse update distribution is preferred to leave weights unassigned for future tasks. Therefore, the second principle introduces a threshold on the loss gradient. This promotes sparse learning by updating a weight only if the loss gradient with respect to that weight is above a certain threshold, i.e. only updating weights with a significant impact on the current loss. Both principles reflect phenomena observed in LTP, where a threshold effect and a gradual saturation of potentiation have been observed. CLASSP is implemented in a Python/PyTorch class, making it applicable to any model. When compared with Elastic Weight Consolidation (EWC) using Computer Vision and sentiment analysis datasets, CLASSP demonstrates superior performance in terms of accuracy and memory footprint.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes