CLJan 28

A Computational Approach to Language Contact -- A Case Study of Persian

arXiv:2601.20592v11 citationsh-index: 2
Originality Incremental advance
AI Analysis

This research addresses the problem of understanding selective contact effects in language models for linguists and computational researchers, though it is incremental as it builds on existing probing methods.

The study investigated how language contact influences intermediate representations in a monolingual Persian language model, finding that universal syntactic information is largely unaffected by historical contact, while morphological features like Case and Gender are strongly shaped by language-specific structure.

We investigate structural traces of language contact in the intermediate representations of a monolingual language model. Focusing on Persian (Farsi) as a historically contact-rich language, we probe the representations of a Persian-trained model when exposed to languages with varying degrees and types of contact with Persian. Our methodology quantifies the amount of linguistic information encoded in intermediate representations and assesses how this information is distributed across model components for different morphosyntactic features. The results show that universal syntactic information is largely insensitive to historical contact, whereas morphological features such as Case and Gender are strongly shaped by language-specific structure, suggesting that contact effects in monolingual language models are selective and structurally constrained.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes