CLMar 1, 2023

Competence-Based Analysis of Language Models

arXiv:2303.00333v59 citationsh-index: 24
Originality Incremental advance
AI Analysis

This work addresses the need for interpretability in LLMs for researchers and practitioners, though it is incremental as it builds on existing probing and causal analysis methods.

The authors tackled the problem of understanding how large language models (LLMs) represent linguistic structure by introducing CALM, a framework that uses causal probing interventions to analyze LLM competence on tasks, and demonstrated its application in explaining behaviors across lexical inference tasks.

Despite the recent successes of large, pretrained neural language models (LLMs), comparatively little is known about the representations of linguistic structure they learn during pretraining, which can lead to unexpected behaviors in response to prompt variation or distribution shift. To better understand these models and behaviors, we introduce a general model analysis framework to study LLMs with respect to their representation and use of human-interpretable linguistic properties. Our framework, CALM (Competence-based Analysis of Language Models), is designed to investigate LLM competence in the context of specific tasks by intervening on models' internal representations of different linguistic properties using causal probing, and measuring models' alignment under these interventions with a given ground-truth causal model of the task. We also develop a new approach for performing causal probing interventions using gradient-based adversarial attacks, which can target a broader range of properties and representations than prior techniques. Finally, we carry out a case study of CALM using these interventions to analyze and compare LLM competence across a variety of lexical inference tasks, showing that CALM can be used to explain behaviors across these tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes