Sparse Activation Editing for Reliable Instruction Following in Narratives
This addresses the challenge of reliable instruction following in narrative-rich settings for users of language models, though it appears incremental as it builds on existing activation editing methods.
The authors tackled the problem of language models struggling to follow instructions in complex narrative contexts by proposing Concise-SAE, a training-free framework that edits instruction-relevant neurons using natural language, and introduced FreeInstruct, a benchmark of 1,212 examples, achieving state-of-the-art instruction adherence without compromising generation quality.
Complex narrative contexts often challenge language models' ability to follow instructions, and existing benchmarks fail to capture these difficulties. To address this, we propose Concise-SAE, a training-free framework that improves instruction following by identifying and editing instruction-relevant neurons using only natural language instructions, without requiring labelled data. To thoroughly evaluate our method, we introduce FreeInstruct, a diverse and realistic benchmark of 1,212 examples that highlights the challenges of instruction following in narrative-rich settings. While initially motivated by complex narratives, Concise-SAE demonstrates state-of-the-art instruction adherence across varied tasks without compromising generation quality.