LGNov 11, 2024

Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks

Madeline Brumley, Joe Kwon, David Krueger, Dmitrii Krasheninnikov, Usman Anwar

arXiv:2411.07213v116.418 citationsh-index: 8

Originality Synthesis-oriented

AI Analysis

This work addresses the need for quantitative comparisons between interpretability approaches to steer models, but it is incremental as it builds on existing methods without introducing new ones.

The study compared bottom-up (function vectors) and top-down (in-context vectors) steering methods for large language models on in-context learning tasks, finding that each method excels in specific task types: in-context vectors outperform in behavioral shifting, while function vectors are better for tasks requiring precision.

A key objective of interpretability research on large language models (LLMs) is to develop methods for robustly steering models toward desired behaviors. To this end, two distinct approaches to interpretability -- ``bottom-up" and ``top-down" -- have been presented, but there has been little quantitative comparison between them. We present a case study comparing the effectiveness of representative vector steering methods from each branch: function vectors (FV; arXiv:2310.15213), as a bottom-up method, and in-context vectors (ICV; arXiv:2311.06668) as a top-down method. While both aim to capture compact representations of broad in-context learning tasks, we find they are effective only on specific types of tasks: ICVs outperform FVs in behavioral shifting, whereas FVs excel in tasks requiring more precision. We discuss the implications for future evaluations of steering methods and for further research into top-down and bottom-up steering given these findings.

View on arXiv PDF

Similar