CL AIJun 2

Fully Automated Identification of Lexical Alignment and Preference-Stage Shifts in Large Language Models

Thomas Stephan Juzek, Xiaoyang Ming, Jose A. Hernandez

arXiv:2606.0316516.8

Predicted impact top 22% in CL · last 90 daysOriginality Incremental advance

AI Analysis

For researchers studying LLM alignment, this provides a scalable, assumption-light method to systematically analyze lexical misalignment, though it is an incremental improvement over existing manual approaches.

This paper introduces two automated metrics—Lexical Alignment Score and Triangulated Preference Shift—to detect lexical overuse in LLMs and attribute it to preference learning, replicating prior findings across six model families on PubMed abstracts without manual curation.

The language used by digital chat assistants such as ChatGPT can diverge from human expectations (misalignment). Research, mostly on Scientific English, has described both what divergences occur and, to some extent, why, linking them to the training stage of human preference learning. Yet, existing approaches rely on manual curation. This paper introduces two curation-free, assumption-light evaluation metrics: the Lexical Alignment Score, which identifies lexical overuse, and the Triangulated Preference Shift, which quantifies how much of such shifts can be attributed to human preference learning. Using PubMed abstracts, continuations were generated and measured using windowed document prevalence across six model families (Falcon, Gemma, Llama, Mistral, OLMo, Yi). The procedure identifies, without manual intervention, overused items such as 'suggest', 'additionally', and 'strategy', and estimates their link to preference learning. Our findings replicate prior work and remain stable across parameter settings, random seeds, and evaluation on further data. The approach scales readily and enables systematic study of lexical (mis)alignment beyond Scientific English and across languages, and as such, the metrics have the potential to contribute to improved alignment for future models and understanding of its origins.

View on arXiv PDF

Similar