CLNov 12, 2024

Large Language Models as Neurolinguistic Subjects: Discrepancy between Performance and Competence

arXiv:2411.07533v33 citationsh-index: 45ACL
Originality Incremental advance
AI Analysis

This work addresses the problem of accurately evaluating LLMs' true linguistic competence for researchers and developers, though it is incremental in refining assessment methods.

The study investigated the linguistic understanding of Large Language Models (LLMs) by distinguishing between performance and competence using psycholinguistic and neurolinguistic assessment paradigms, finding that LLMs exhibit higher competence and performance in form compared to meaning.

This study investigates the linguistic understanding of Large Language Models (LLMs) regarding signifier (form) and signified (meaning) by distinguishing two LLM assessment paradigms: psycholinguistic and neurolinguistic. Traditional psycholinguistic evaluations often reflect statistical rules that may not accurately represent LLMs' true linguistic competence. We introduce a neurolinguistic approach, utilizing a novel method that combines minimal pair and diagnostic probing to analyze activation patterns across model layers. This method allows for a detailed examination of how LLMs represent form and meaning, and whether these representations are consistent across languages. We found: (1) Psycholinguistic and neurolinguistic methods reveal that language performance and competence are distinct; (2) Direct probability measurement may not accurately assess linguistic competence; (3) Instruction tuning won't change much competence but improve performance; (4) LLMs exhibit higher competence and performance in form compared to meaning. Additionally, we introduce new conceptual minimal pair datasets for Chinese (COMPS-ZH) and German (COMPS-DE), complementing existing English datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes