CLAug 19, 2024

Resolving Lexical Bias in Model Editing

Hammad Rizwan, Domenic Rosati, Ga Wu, Hassan Sajjad

arXiv:2408.10411v33.43 citationsh-index: 4Has Code

Originality Incremental advance

AI Analysis

This addresses a critical vulnerability in model editing for large language models, offering a more precise and efficient solution, though it is incremental as it builds on existing adapter techniques.

The paper tackles the problem of lexical bias in model editing, where current adapter methods incorrectly apply edits to irrelevant prompts with overlapping words, and presents a method that learns a disentangled representation space to achieve state-of-the-art results with improved computational efficiency.

Model editing aims to modify the outputs of large language models after they are trained. Previous approaches have often involved direct alterations to model weights, which can result in model degradation. Recent techniques avoid making modifications to the model's weights by using an adapter that applies edits to the model when triggered by semantic similarity in the representation space. We demonstrate that current adapter methods are critically vulnerable to strong lexical biases, leading to issues such as applying edits to irrelevant prompts with overlapping words. This paper presents a principled approach to learning a disentangled representation space that facilitates precise localization of edits by maintaining distance between irrelevant prompts while preserving proximity among paraphrases. In our empirical study, we show that our method (Projector Editor Networks for Model Editing - PENME) achieves state-of-the-art model editing results while being more computationally efficient during inference than previous methods and adaptable across different architectures.

View on arXiv PDF Code

Similar