CLOct 14, 2025

Language Models Model Language

arXiv:2510.12766v1
Originality Incremental advance
AI Analysis

This addresses theoretical debates in linguistics and AI by offering a practical perspective for researchers and practitioners working on language models.

The paper tackles the problem of speculative linguistic critiques of LLMs by advocating for an empiricist framework based on Witold Mańczak's principles, arguing that language is defined by usage frequency and providing a constructive guide for LLM design and evaluation.

Linguistic commentary on LLMs, heavily influenced by the theoretical frameworks of de Saussure and Chomsky, is often speculative and unproductive. Critics challenge whether LLMs can legitimately model language, citing the need for "deep structure" or "grounding" to achieve an idealized linguistic "competence." We argue for a radical shift in perspective towards the empiricist principles of Witold Mańczak, a prominent general and historical linguist. He defines language not as a "system of signs" or a "computational system of the brain" but as the totality of all that is said and written. Above all, he identifies frequency of use of particular language elements as language's primary governing principle. Using his framework, we challenge prior critiques of LLMs and provide a constructive guide for designing, evaluating, and interpreting language models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes