CLNov 14, 2023

How Well Do Large Language Models Understand Syntax? An Evaluation by Asking Natural Language Questions

Houquan Zhou, Yang Hou, Zhenghua Li, Xuebin Wang, Zhefeng Wang, Xinyu Duan, Min Zhang

arXiv:2311.08287v12.912 citationsh-index: 10Has Code

Originality Incremental advance

AI Analysis

This work addresses the problem of assessing whether LLMs truly comprehend language syntax, which is crucial for researchers and developers aiming to improve AI language understanding, though it is incremental as it builds on existing evaluation methods.

The study evaluated how well 24 large language models (LLMs) understand syntax by testing them on natural language questions targeting nine syntactic knowledge points, finding that most models have limited grasp with notable discrepancies, such as struggling with prepositional phrase attachment but handling adjectival modifiers and indirect objects more easily.

While recent advancements in large language models (LLMs) bring us closer to achieving artificial general intelligence, the question persists: Do LLMs truly understand language, or do they merely mimic comprehension through pattern recognition? This study seeks to explore this question through the lens of syntax, a crucial component of sentence comprehension. Adopting a natural language question-answering (Q&A) scheme, we craft questions targeting nine syntactic knowledge points that are most closely related to sentence comprehension. Experiments conducted on 24 LLMs suggest that most have a limited grasp of syntactic knowledge, exhibiting notable discrepancies across different syntactic knowledge points. In particular, questions involving prepositional phrase attachment pose the greatest challenge, whereas those concerning adjectival modifier and indirect object are relatively easier for LLMs to handle. Furthermore, a case study on the training dynamics of the LLMs reveals that the majority of syntactic knowledge is learned during the initial stages of training, hinting that simply increasing the number of training tokens may not be the `silver bullet' for improving the comprehension ability of LLMs.

View on arXiv PDF Code

Similar