LGCLMar 12, 2024

pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

Stanford
arXiv:2403.07809v155 citationsh-index: 24Has CodeNAACL
Originality Synthesis-oriented
AI Analysis

This provides a tool for researchers in AI interpretability, model editing, and related fields, but it is incremental as it builds on existing intervention concepts with a new library implementation.

The authors tackled the challenge of performing interventions on model-internal states in AI research by introducing pyvene, an open-source Python library for customizable interventions on PyTorch modules, which supports complex schemes and has been published with code and tutorials.

Interventions on model-internal states are fundamental operations in many areas of AI, including model editing, steering, robustness, and interpretability. To facilitate such research, we introduce $\textbf{pyvene}$, an open-source Python library that supports customizable interventions on a range of different PyTorch modules. $\textbf{pyvene}$ supports complex intervention schemes with an intuitive configuration format, and its interventions can be static or include trainable parameters. We show how $\textbf{pyvene}$ provides a unified and extensible framework for performing interventions on neural models and sharing the intervened upon models with others. We illustrate the power of the library via interpretability analyses using causal abstraction and knowledge localization. We publish our library through Python Package Index (PyPI) and provide code, documentation, and tutorials at https://github.com/stanfordnlp/pyvene.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes