GLUScope: A Tool for Analyzing GLU Neurons in Transformer Language Models
This provides a specialized tool for interpretability researchers working on modern Transformer models, but it is incremental as it builds on existing neuron analysis methods by addressing new activation functions.
The authors tackled the challenge of analyzing neurons in Transformer language models with gated activation functions like SwiGLU, where understanding both gate and activation signs is crucial, and they developed GLUScope, an open-source tool that visualizes text examples for four sign combinations to aid interpretability research.
We present GLUScope, an open-source tool for analyzing neurons in Transformer-based language models, intended for interpretability researchers. We focus on more recent models than previous tools do; specifically we consider gated activation functions such as SwiGLU. This introduces a new challenge: understanding positive activations is not enough. Instead, both the gate and the in activation of a neuron can be positive or negative, leading to four different possible sign combinations that in some cases have quite different functionalities. Accordingly, for any neuron, our tool shows text examples for each of the four sign combinations, and indicates how often each combination occurs. We describe examples of how our tool can lead to novel insights. A demo is available at https: //sjgerstner.github.io/gluscope.