CLMar 23

Current LLMs still cannot 'talk much' about grammar modules: Evidence from syntax

arXiv:2603.2011478.9h-index: 7
AI Analysis

This highlights a significant limitation in LLMs for linguistics and AI applications, particularly in specialized domains like syntax, and is incremental as it builds on existing evidence of LLM shortcomings.

The study examined the ability of Large Language Models (LLMs) to accurately translate core syntax terms from English to Arabic, finding that only 25% of ChatGPT-5 translations were accurate, with 38.6% inaccurate and 36.4% partially correct.

We aim to examine the extent to which Large Language Models (LLMs) can 'talk much' about grammar modules, providing evidence from syntax core properties translated by ChatGPT into Arabic. We collected 44 terms from generative syntax previous works, including books and journal articles, as well as from our experience in the field. These terms were translated by humans, and then by ChatGPT-5. We then analyzed and compared both translations. We used an analytical and comparative approach in our analysis. Findings unveil that LLMs still cannot 'talk much' about the core syntax properties embedded in the terms under study involving several syntactic and semantic challenges: only 25% of ChatGPT translations were accurate, while 38.6% were inaccurate, and 36.4.% were partially correct, which we consider appropriate. Based on these findings, a set of actionable strategies were proposed, the most notable of which is a close collaboration between AI specialists and linguists to better LLMs' working mechanism for accurate or at least appropriate translation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes