CLSep 24, 2024

Unveiling Language Competence Neurons: A Psycholinguistic Approach to Model Interpretability

arXiv:2409.15827v222 citationsh-index: 7
Originality Incremental advance
AI Analysis

It provides a novel interpretability approach for understanding language competence in LLMs, though it is incremental in applying psycholinguistic methods to existing models.

The study used psycholinguistic tasks to probe neuron-level representations in GPT-2-XL, finding that it shows human-like abilities in sound-gender association and implicit causality but struggles with sound-shape association, with specific neurons linked to these competencies.

As large language models (LLMs) advance in their linguistic capacity, understanding how they capture aspects of language competence remains a significant challenge. This study therefore employs psycholinguistic paradigms in English, which are well-suited for probing deeper cognitive aspects of language processing, to explore neuron-level representations in language model across three tasks: sound-shape association, sound-gender association, and implicit causality. Our findings indicate that while GPT-2-XL struggles with the sound-shape task, it demonstrates human-like abilities in both sound-gender association and implicit causality. Targeted neuron ablation and activation manipulation reveal a crucial relationship: When GPT-2-XL displays a linguistic ability, specific neurons correspond to that competence; conversely, the absence of such an ability indicates a lack of specialized neurons. This study is the first to utilize psycholinguistic experiments to investigate deep language competence at the neuron level, providing a new level of granularity in model interpretability and insights into the internal mechanisms driving language ability in the transformer-based LLM.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes