CLAIHCLGFeb 27, 2023

Inseq: An Interpretability Toolkit for Sequence Generation Models

arXiv:2302.13942v3252 citationsh-index: 35
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of limited interpretability tools for sequence generation models in natural language processing, providing a centralized toolkit for researchers and practitioners, though it is incremental as it builds on existing techniques.

The authors tackled the lack of interpretability tools for sequence generation models by introducing Inseq, a Python library that enables intuitive extraction of internal information and feature importance scores for Transformer architectures, demonstrating its use in highlighting gender biases in machine translation and locating factual knowledge in GPT-2.

Past work in natural language processing interpretability focused mainly on popular classification tasks while largely overlooking generation settings, partly due to a lack of dedicated tools. In this work, we introduce Inseq, a Python library to democratize access to interpretability analyses of sequence generation models. Inseq enables intuitive and optimized extraction of models' internal information and feature importance scores for popular decoder-only and encoder-decoder Transformers architectures. We showcase its potential by adopting it to highlight gender biases in machine translation models and locate factual knowledge inside GPT-2. Thanks to its extensible interface supporting cutting-edge techniques such as contrastive feature attribution, Inseq can drive future advances in explainable natural language generation, centralizing good practices and enabling fair and reproducible model evaluations.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes