AIApr 20

State Transfer Reveals Reuse in Controlled Routing

arXiv:2604.1815835.2h-index: 16
AI Analysis

For researchers studying model interpretability and state reuse, this work distinguishes fixed-interface reuse from prompt relocation, providing a more rigorous test for identifying where behavior is represented.

The paper studies where behaviorally relevant state is represented in controlled routing tasks, finding that fixed-interface transfer provides stronger evidence of reuse than trained prompt success alone. On GPT-2 triop, an early interface supports exact transfer; on GPT-2 add/sub, zero-retrain compiled transfer recovers most donor routing accuracy.

Prompt-based interventions can change model behavior, but trained success alone does not identify where the behaviorally relevant state is represented. We study this question in controlled routing tasks using interfaces chosen on support data, held-out query evaluation, and matched necessity, sufficiency, and wrong-interface controls. On GPT-2 triop, an early interface supports exact transfer under these tests. On GPT-2 add/sub, zero-retrain compiled transfer at the fixed interface recovers most of donor routing accuracy, while trainable prompt slots can relearn the same behavior at several other positions only after additional support examples and optimization. These results distinguish fixed-interface reuse from prompt relocation in a setting where the two can be tested directly. Qwen routing provides a cross-architecture consistency check for the same matched-interface pattern at the operator token, although donor-specific identity on the local V-path remains unresolved. Generation and reasoning branches are used to map scope: they show broader transport or weaker controller identifiability once control depends on longer trajectories or harder selection. In controlled routing, fixed-interface transfer is therefore stronger evidence of reuse than trained prompt success alone.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes