CLLGFeb 10, 2022

Locating and Editing Factual Associations in GPT

arXiv:2202.05262v52630 citations
AI Analysis

This work addresses the problem of model interpretability and control for AI researchers and practitioners, offering an incremental improvement in editing techniques.

The authors tackled the problem of understanding and editing factual associations in GPT models, finding that these associations are localized in middle-layer feed-forward modules and can be directly edited using Rank-One Model Editing (ROME), which performed comparably to existing methods on a standard task and better on a new counterfactual dataset.

We analyze the storage and recall of factual associations in autoregressive transformer language models, finding evidence that these associations correspond to localized, directly-editable computations. We first develop a causal intervention for identifying neuron activations that are decisive in a model's factual predictions. This reveals a distinct set of steps in middle-layer feed-forward modules that mediate factual predictions while processing subject tokens. To test our hypothesis that these computations correspond to factual association recall, we modify feed-forward weights to update specific factual associations using Rank-One Model Editing (ROME). We find that ROME is effective on a standard zero-shot relation extraction (zsRE) model-editing task, comparable to existing methods. To perform a more sensitive evaluation, we also evaluate ROME on a new dataset of counterfactual assertions, on which it simultaneously maintains both specificity and generalization, whereas other methods sacrifice one or another. Our results confirm an important role for mid-layer feed-forward modules in storing factual associations and suggest that direct manipulation of computational mechanisms may be a feasible approach for model editing. The code, dataset, visualizations, and an interactive demo notebook are available at https://rome.baulab.info/

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes