CL AI LGAug 19, 2024

ELDER: Enhancing Lifelong Model Editing with Mixture-of-LoRA

Jiaang Li, Quan Wang, Zhongnan Wang, Yongdong Zhang, Zhendong Mao

arXiv:2408.11869v36.17 citationsh-index: 8Has Code

Originality Incremental advance

AI Analysis

This addresses the issue of robust and scalable knowledge updates for LLM users, though it is incremental as it builds on existing parameter-efficient fine-tuning methods.

The paper tackles the problem of lifelong model editing in large language models, where sequential updates cause forgetting, by proposing ELDER, which uses a mixture of LoRAs with a router network to create continuous data-adapter associations, resulting in outperforming eight baselines on GPT-2 XL and LLaMA2-7B while preserving general abilities.

Large language models (LLMs) require model editing to efficiently update specific knowledge within them and avoid factual errors. Most model editing methods are solely designed for single-time use and result in a significant forgetting effect in lifelong editing scenarios, where sequential edits are conducted over time. Previous approaches manage sequential edits by freezing original parameters and discretely allocating new parameters for each knowledge update. However, these methods lack robustness to minor input variations due to the discrete mapping between data and parameters. To overcome this challenge, we propose ELDER, a novel approach to create a continuous association between data and adapters. ELDER integrates multiple LoRAs through a router network and is trained to establish a smooth data-adapter association, thereby enhancing the edit robustness and generalization of semantically equivalent inputs. To ensure inputs containing the same knowledge will be processed by the same LoRAs, we design a novel loss to guide the model link LoRA allocations with edit knowledge. Furthermore, we propose a deferral mechanism to retain the original LLM capabilities post-edit. Extensive experiments on GPT-2 XL and LLaMA2-7B demonstrate that ELDER effectively edits models in the lifelong setting, outperforming eight baselines while exhibiting strong scalability and preserving LLMs' general abilities on downstream tasks. Our code is available at https://github.com/JiaangL/ELDER.

View on arXiv PDF Code

Similar