Sense Representations Are Inducible Interfaces
For practitioners needing interpretable and controllable LMs, ACROS provides a method to add sense-level interfaces to existing pretrained models without retraining.
ACROS induces explicit sense representations into a frozen pretrained decoder LM via gated residual addition, enabling zero-shot WSD (64.95 F1), lexical steering (90% positive shift recovery), and cross-lingual adaptation (mean R@1 0.988) while preserving base LM quality.
Sense representations (explicit, per-token meaning decompositions) are useful for disambiguation, steering, and cross-lingual alignment, but existing approaches require models to be pretrained with sense structure baked in. We introduce ACROS, which induces an explicit sense pathway into a frozen pretrained decoder LM through a gated residual addition. On SmolLM2-360M, ACROS preserves base LM quality while supporting three uses of the same induced variables: zero-shot word-sense disambiguation (64.95 F1 on Raganato ALL, competitive with the WordNet first-sense heuristic), low-KL lexical steering across 5,161 CoInCo cases where a simple non-oracle proxy recovers about 90% of positive shifts, and SENSIA cross-lingual adaptation to four languages (mean R@1 0.988, target FLORES PPL 7.94). ACROS makes sense representations an inducible interface for ordinary pretrained LMs.