Fast & Faithful Function Vectors
For researchers using function vectors to steer LLMs, this work provides incremental improvements to efficiency and accuracy.
The paper investigates design choices in function vectors for LLM steering, finding that gradient-based head selection with LRP improves efficiency and accuracy, and distributed steering outperforms simple aggregation.
Function vectors (FVs) are task representations elicited during in-context learning that can be used to steer Large Language Models (LLMs). However, design choices in their formulation remain underexplored. In this work, we study the impact of varying FV definitions for instructions along two degrees of freedom: attention head selection and steering. For head selection, using gradient-based attributions with Layer-wise Relevance Propagation (LRP) substantially improves efficiency as well as accuracy. For FV steering, applying it in a distributed manner yields a higher accuracy compared to simple aggregation. Our code is publicly available.