SAKE: Steering Activations for Knowledge Editing
This addresses the need for controlled knowledge editing in LLMs, offering improvements in contextual robustness and generalization, though it is incremental in the field of knowledge editing.
The paper tackles the problem of updating specific facts in large language models by proposing SAKE, a method that models facts as distributions and uses optimal transport to edit behavior across paraphrases and logical implications, resulting in more robust edits than existing approaches.
As Large Langue Models have been shown to memorize real-world facts, the need to update this knowledge in a controlled and efficient manner arises. Designed with these constraints in mind, Knowledge Editing (KE) approaches propose to alter specific facts in pretrained models. However, they have been shown to suffer from several limitations, including their lack of contextual robustness and their failure to generalize to logical implications related to the fact. To overcome these issues, we propose SAKE, a steering activation method that models a fact to be edited as a distribution rather than a single prompt. Leveraging Optimal Transport, SAKE alters the LLM behavior over a whole fact-related distribution, defined as paraphrases and logical implications. Several numerical experiments demonstrate the effectiveness of this method: SAKE is thus able to perform more robust edits than its existing counterparts.