Continuous Interpretive Steering for Scalar Diversity
This work addresses the challenge of assessing nuanced pragmatic sensitivity in LLMs for natural language processing researchers, though it is incremental as it builds on existing activation steering methods.
The study tackled the problem of evaluating graded pragmatic inference in large language models (LLMs) by introducing Continuous Interpretive Steering (CIS) and a new dataset, GraSD, showing that graded activation steering yields differentiated interpretive shifts aligned with scalar diversity grades, while uniform steering collapses item-level variation.
Pragmatic inference is inherently graded. Different lexical items give rise to pragmatic enrichment to different degrees. Scalar implicature exemplifies this property through scalar diversity, where implicature strength varies across scalar items. However, evaluations of pragmatic inference in large language models (LLMs) often rely on prompt-based manipulations. Beyond prompt-level effects, this study introduces Continuous Interpretive Steering (CIS), a method that probes graded pragmatic interpretation by treating activation-level steering strength as a continuous experimental variable. To support this analysis, this study introduces a new dataset, GraSD, which encodes graded scalar diversity. Experiments on four LLMs show that uniform activation steering increases pragmatic interpretations globally but collapses item-level variation, whereas graded activation steering yields differentiated interpretive shifts aligned with scalar diversity grades. It indicates that graded sensitivity is encoded in the representation space and can be systematically recovered through controlled intervention. Together, CIS and GraSD provide a principled framework for evaluating graded pragmatic sensitivity in LLMs.