CLAIMay 27

Sense Representations Are Inducible Interfaces

arXiv:2605.2866959.4
AI Analysis

For practitioners needing interpretable and controllable LMs, ACROS provides a method to add sense-level interfaces to existing pretrained models without retraining.

ACROS induces explicit sense representations into a frozen pretrained decoder LM via gated residual addition, enabling zero-shot WSD (64.95 F1), lexical steering (90% positive shift recovery), and cross-lingual adaptation (mean R@1 0.988) while preserving base LM quality.

Sense representations (explicit, per-token meaning decompositions) are useful for disambiguation, steering, and cross-lingual alignment, but existing approaches require models to be pretrained with sense structure baked in. We introduce ACROS, which induces an explicit sense pathway into a frozen pretrained decoder LM through a gated residual addition. On SmolLM2-360M, ACROS preserves base LM quality while supporting three uses of the same induced variables: zero-shot word-sense disambiguation (64.95 F1 on Raganato ALL, competitive with the WordNet first-sense heuristic), low-KL lexical steering across 5,161 CoInCo cases where a simple non-oracle proxy recovers about 90% of positive shifts, and SENSIA cross-lingual adaptation to four languages (mean R@1 0.988, target FLORES PPL 7.94). ACROS makes sense representations an inducible interface for ordinary pretrained LMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes