LGAICLBMDec 17, 2023

Identification of Knowledge Neurons in Protein Language Models

arXiv:2312.10770v14 citationsh-index: 4
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of trust in model predictions for computational biology, but it is incremental as it applies existing interpretability methods to a new domain.

The paper tackled the problem of interpretability in protein language models by identifying knowledge neurons in the ESM model, showing that activation-based and integrated gradient-based selection methods outperform a random baseline and reveal a high density of knowledge neurons in key vector prediction networks.

Neural language models have become powerful tools for learning complex representations of entities in natural language processing tasks. However, their interpretability remains a significant challenge, particularly in domains like computational biology where trust in model predictions is crucial. In this work, we aim to enhance the interpretability of protein language models, specifically the state-of-the-art ESM model, by identifying and characterizing knowledge neurons - components that express understanding of key information. After fine-tuning the ESM model for the task of enzyme sequence classification, we compare two knowledge neuron selection methods that preserve a subset of neurons from the original model. The two methods, activation-based and integrated gradient-based selection, consistently outperform a random baseline. In particular, these methods show that there is a high density of knowledge neurons in the key vector prediction networks of self-attention modules. Given that key vectors specialize in understanding different features of input sequences, these knowledge neurons could capture knowledge of different enzyme sequence motifs. In the future, the types of knowledge captured by each neuron could be characterized.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes