CLAILGFeb 3, 2025

Joint Localization and Activation Editing for Low-Resource Fine-Tuning

arXiv:2502.01179v45 citationsh-index: 5Has CodeICML
Originality Incremental advance
AI Analysis

This work addresses the challenge of adapting LLMs with very small datasets, which is crucial for applications with limited data availability.

The paper tackles the problem of low-resource fine-tuning for LLMs by proposing JoLA, a method that jointly learns which Transformer heads to edit and how to edit them, achieving consistent performance improvements across three benchmarks.

Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, are commonly used to adapt LLMs. However, the effectiveness of standard PEFT methods is limited in low-resource scenarios with only a few hundred examples. Recent advances in interpretability research have inspired the emergence of activation editing (or steering) techniques, which modify the activations of specific model components. Due to their extremely small parameter counts, these methods show promise for small datasets. However, their performance is highly dependent on identifying the correct modules to edit and often lacks stability across different datasets. In this paper, we propose Joint Localization and Activation Editing (JoLA), a method that jointly learns (1) which heads in the Transformer to edit (2) whether the intervention should be additive, multiplicative, or both and (3) the intervention parameters themselves - the vectors applied as additive offsets or multiplicative scalings to the head output. Through evaluations on three benchmarks spanning commonsense reasoning, natural language understanding, and natural language generation, we demonstrate that JoLA consistently outperforms existing methods. The code for the method is released at https://github.com/wenlai-lavine/jola.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes