CLApr 26, 2022

LM-Debugger: An Interactive Tool for Inspection and Intervention in Transformer-Based Language Models

DeepMind
arXiv:2204.12130v2311 citationsh-index: 45Has Code
Originality Incremental advance
AI Analysis

This tool addresses the opacity of language models for researchers and practitioners, offering a practical debugging and intervention framework, though it is incremental as it builds on existing interpretation methods.

The authors tackled the problem of interpreting the internal prediction process in transformer-based language models by introducing LM-Debugger, an interactive tool that provides fine-grained interpretation and allows for user interventions, demonstrated on GPT2 models.

The opaque nature and unexplained behavior of transformer-based language models (LMs) have spurred a wide interest in interpreting their predictions. However, current interpretation methods mostly focus on probing models from outside, executing behavioral tests, and analyzing salience input features, while the internal prediction construction process is largely not understood. In this work, we introduce LM-Debugger, an interactive debugger tool for transformer-based LMs, which provides a fine-grained interpretation of the model's internal prediction process, as well as a powerful framework for intervening in LM behavior. For its backbone, LM-Debugger relies on a recent method that interprets the inner token representations and their updates by the feed-forward layers in the vocabulary space. We demonstrate the utility of LM-Debugger for single-prediction debugging, by inspecting the internal disambiguation process done by GPT2. Moreover, we show how easily LM-Debugger allows to shift model behavior in a direction of the user's choice, by identifying a few vectors in the network and inducing effective interventions to the prediction process. We release LM-Debugger as an open-source tool and a demo over GPT2 models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes