Hierarchical Orthogonal Residual Spread for Precise Massive Editing in Large Language Models
This work addresses safety issues in large language models, offering a more stable and efficient editing approach, though it appears incremental as it builds on existing model editing methods.
The paper tackles the problem of safety concerns in large language models by introducing HORSE, a method for precise massive editing that reduces noisy gradients and enables stable edits, demonstrating effectiveness through theoretical comparisons and experiments across multiple LLMs and datasets.
Large language models (LLMs) exhibit exceptional performance across various domains, yet they face critical safety concerns. Model editing has emerged as an effective approach to mitigate these issues. Existing model editing methods often focus on optimizing an information matrix that blends new and old knowledge. While effective, these approaches can be computationally expensive and may cause conflicts. In contrast, we shift our attention to Hierarchical Orthogonal Residual SprEad of the information matrix, which reduces noisy gradients and enables more stable edits from a different perspective. We demonstrate the effectiveness of our method HORSE through a clear theoretical comparison with several popular methods and extensive experiments conducted on two datasets across multiple LLMs. The results show that HORSE maintains precise massive editing across diverse scenarios. The code is available at https://github.com/XiaojieGu/HORSE