The GRADIEND Python Package: An End-to-End System for Gradient-Based Feature Learning
This work offers a tool for researchers and practitioners in natural language processing to operationalize gradient-based feature learning, but it appears incremental as it builds on existing methods and reproduces prior use cases.
The authors tackled the problem of learning feature directions from gradients in language models by introducing the GRADIEND Python package, which provides an end-to-end system for data creation, training, evaluation, and model rewriting, demonstrated on an English pronoun paradigm and large-scale feature comparisons.
We present gradiend, an open-source Python package that operationalizes the GRADIEND method for learning feature directions from factual-counterfactual MLM and CLM gradients in language models. The package provides a unified workflow for feature-related data creation, training, evaluation, visualization, persistent model rewriting via controlled weight updates, and multi-feature comparison. We demonstrate GRADIEND on an English pronoun paradigm and on a large-scale feature comparison that reproduces prior use cases.