Large Language Models Are Partially Primed in Pronoun Interpretation
This work addresses the problem of understanding human-like cognitive biases in AI models for researchers in computational linguistics and psycholinguistics, though it is incremental as it builds on prior psycholinguistic studies.
The study investigated whether large language models (LLMs) adapt to linguistic biases like humans by testing their pronoun interpretation in psycholinguistic experiments, finding that InstructGPT partially adapts to syntactic but not semantic patterns, while FLAN-UL2 showed no meaningful adaptation.
While a large body of literature suggests that large language models (LLMs) acquire rich linguistic representations, little is known about whether they adapt to linguistic biases in a human-like way. The present study probes this question by asking whether LLMs display human-like referential biases using stimuli and procedures from real psycholinguistic experiments. Recent psycholinguistic studies suggest that humans adapt their referential biases with recent exposure to referential patterns; closely replicating three relevant psycholinguistic experiments from Johnson & Arnold (2022) in an in-context learning (ICL) framework, we found that InstructGPT adapts its pronominal interpretations in response to the frequency of referential patterns in the local discourse, though in a limited fashion: adaptation was only observed relative to syntactic but not semantic biases. By contrast, FLAN-UL2 fails to generate meaningful patterns. Our results provide further evidence that contemporary LLMs discourse representations are sensitive to syntactic patterns in the local context but less so to semantic patterns. Our data and code are available at \url{https://github.com/zkx06111/llm_priming}.