CLApr 10, 2024

LM Transparency Tool: Interactive Tool for Analyzing Transformer Language Models

Igor Tufanov, Karen Hambardzumyan, Javier Ferrando, Elena Voita

arXiv:2404.07004v117.736 citationsh-index: 14Has CodeACL

Originality Incremental advance

AI Analysis

This tool addresses the challenge of interpretability for large Transformer models, benefiting the research community by enabling more efficient analysis of model components, though it is incremental as it builds on existing interpretability methods.

The researchers tackled the problem of analyzing Transformer language models by developing the LM Transparency Tool, an interactive toolkit that makes the entire prediction process transparent and allows tracing model behavior from top-layer representations to fine-grained components, resulting in a tool that supports interpretability by showing important information flow and attributing changes to specific model parts.

We present the LM Transparency Tool (LM-TT), an open-source interactive toolkit for analyzing the internal workings of Transformer-based language models. Differently from previously existing tools that focus on isolated parts of the decision-making process, our framework is designed to make the entire prediction process transparent, and allows tracing back model behavior from the top-layer representation to very fine-grained parts of the model. Specifically, it (1) shows the important part of the whole input-to-output information flow, (2) allows attributing any changes done by a model block to individual attention heads and feed-forward neurons, (3) allows interpreting the functions of those heads or neurons. A crucial part of this pipeline is showing the importance of specific model components at each step. As a result, we are able to look at the roles of model components only in cases where they are important for a prediction. Since knowing which components should be inspected is key for analyzing large models where the number of these components is extremely high, we believe our tool will greatly support the interpretability community both in research settings and in practical applications.

View on arXiv PDF

Similar