CLLGDec 10, 2025

Interpreto: An Explainability Library for Transformers

arXiv:2512.09730v1h-index: 4Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the need for accessible explainability tools for data scientists and end users working with transformer models, though it is incremental as it builds on existing research to create practical tooling.

The authors tackled the problem of explainability for transformer models by introducing Interpreto, a Python library that provides post-hoc explanations for text models, including both attribution and concept-based methods, with a unified API for classification and generation tasks.

Interpreto is a Python library for post-hoc explainability of text HuggingFace models, from early BERT variants to LLMs. It provides two complementary families of methods: attributions and concept-based explanations. The library connects recent research to practical tooling for data scientists, aiming to make explanations accessible to end users. It includes documentation, examples, and tutorials. Interpreto supports both classification and generation models through a unified API. A key differentiator is its concept-based functionality, which goes beyond feature-level attributions and is uncommon in existing libraries. The library is open source; install via pip install interpreto. Code and documentation are available at https://github.com/FOR-sight-ai/interpreto.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes