CLApr 2, 2025

Language Models at the Syntax-Semantics Interface: A Case Study of the Long-Distance Binding of Chinese Reflexive ziji

arXiv:2504.02116v120 citationsh-index: 2Has CodeCOLING
Originality Synthesis-oriented
AI Analysis

This addresses the problem of evaluating language models on complex linguistic phenomena for researchers in NLP and linguistics, but it is incremental as it focuses on a specific case study.

The paper investigated whether language models can resolve the binding patterns of the Chinese reflexive ziji, finding that none of the 21 models tested consistently replicated human judgments, with models relying on sequential cues and overlooking semantic and syntactic constraints.

This paper explores whether language models can effectively resolve the complex binding patterns of the Mandarin Chinese reflexive ziji, which are constrained by both syntactic and semantic factors. We construct a dataset of 240 synthetic sentences using templates and examples from syntactic literature, along with 320 natural sentences from the BCC corpus. Evaluating 21 language models against this dataset and comparing their performance to judgments from native Mandarin speakers, we find that none of the models consistently replicates human-like judgments. The results indicate that existing language models tend to rely heavily on sequential cues, though not always favoring the closest strings, and often overlooking subtle semantic and syntactic constraints. They tend to be more sensitive to noun-related than verb-related semantics.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes