CLDec 23, 2024

Interweaving Memories of a Siamese Large Language Model

arXiv:2412.17383v1h-index: 8Has Code
Originality Incremental advance
AI Analysis

This addresses a critical issue for practitioners using PEFT methods to adapt LLMs to downstream tasks without losing general knowledge, though it appears incremental as it builds on existing PEFT approaches.

The paper tackles the problem of catastrophic forgetting in parameter-efficient fine-tuning (PEFT) of large language models by proposing IMSM, a model-agnostic framework that interweaves memories from pre-trained and fine-tuned parameters, resulting in significant performance improvements and effective mitigation of forgetting while maintaining efficiency.

Parameter-efficient fine-tuning (PEFT) methods optimize large language models (LLMs) by modifying or introducing a small number of parameters to enhance alignment with downstream tasks. However, they can result in catastrophic forgetting, where LLMs prioritize new knowledge at the expense of comprehensive world knowledge. A promising approach to mitigate this issue is to recall prior memories based on the original knowledge. To this end, we propose a model-agnostic PEFT framework, IMSM, which Interweaves Memories of a Siamese Large Language Model. Specifically, our siamese LLM is equipped with an existing PEFT method. Given an incoming query, it generates two distinct memories based on the pre-trained and fine-tuned parameters. IMSM then incorporates an interweaving mechanism that regulates the contributions of both original and enhanced memories when generating the next token. This framework is theoretically applicable to all open-source LLMs and existing PEFT methods. We conduct extensive experiments across various benchmark datasets, evaluating the performance of popular open-source LLMs using the proposed IMSM, in comparison to both classical and leading PEFT methods. Our findings indicate that IMSM maintains comparable time and space efficiency to backbone PEFT methods while significantly improving performance and effectively mitigating catastrophic forgetting.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes