CLDec 11, 2024

Efficient Continual Learning for Small Language Models with a Discrete Key-Value Bottleneck

arXiv:2412.08528v22 citationsh-index: 29ICNLSP
Originality Incremental advance
AI Analysis

This addresses the challenge of efficient continual learning for NLP models, particularly for small-scale applications, though it is incremental as it adapts an existing bottleneck method from vision to NLP.

The paper tackles the problem of catastrophic forgetting in continual learning for small language models by introducing a discrete key-value bottleneck (DKVB), achieving competitive performance with lower computational costs in various scenarios, including single-head settings without task IDs.

Continual learning remains a challenge across various natural language processing (NLP) tasks, as models updated with new training data often risk catastrophic forgetting of previously acquired knowledge. We introduce a discrete key-value bottleneck (DKVB) for encoder-only language models, enabling efficient continual learning through localized updates. Inspired by a discrete key-value bottleneck in vision, we consider new and NLP-specific challenges. We compare different bottleneck architectures for NLP and introduce a new, task-independent initialization technique for the discrete keys. We evaluate our DKVB for NLP in four continual learning scenarios and show that it alleviates catastrophic forgetting. Our experiments demonstrate that the proposed approach achieves competitive performance compared to popular continual learning methods while incurring lower computational costs. Furthermore, we show that DKVB remains effective even in challenging single-head continual learning scenarios where no task ID is provided.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes