Interpreto: An Explainability Library for Transformers
This work addresses the need for accessible explainability tools for data scientists and end users working with transformer models, though it is incremental as it builds on existing research to create practical tooling.
The authors tackled the problem of explainability for transformer models by introducing Interpreto, a Python library that provides post-hoc explanations for text models, including both attribution and concept-based methods, with a unified API for classification and generation tasks.
Interpreto is a Python library for post-hoc explainability of text HuggingFace models, from early BERT variants to LLMs. It provides two complementary families of methods: attributions and concept-based explanations. The library connects recent research to practical tooling for data scientists, aiming to make explanations accessible to end users. It includes documentation, examples, and tutorials. Interpreto supports both classification and generation models through a unified API. A key differentiator is its concept-based functionality, which goes beyond feature-level attributions and is uncommon in existing libraries. The library is open source; install via pip install interpreto. Code and documentation are available at https://github.com/FOR-sight-ai/interpreto.