LGAIHCNov 15, 2023

Wrapper Boxes: Faithful Attribution of Model Predictions to Training Data

arXiv:2311.08644v31 citationsh-index: 3Has Code
Originality Incremental advance
AI Analysis

This enables contestable AI decisions by identifying responsible training data, addressing transparency needs for users of language models.

The authors tackled the problem of providing faithful explanations for neural model predictions by training interpretable 'wrapper box' models on learned neural features, achieving comparable predictive performance across seven language models while enabling direct attribution of decisions to specific training examples.

Can we preserve the accuracy of neural models while also providing faithful explanations of model decisions to training data? We propose a "wrapper box'' pipeline: training a neural model as usual and then using its learned feature representation in classic, interpretable models to perform prediction. Across seven language models of varying sizes, including four large language models (LLMs), two datasets at different scales, three classic models, and four evaluation metrics, we first show that the predictive performance of wrapper classic models is largely comparable to the original neural models. Because classic models are transparent, each model decision is determined by a known set of training examples that can be directly shown to users. Our pipeline thus preserves the predictive performance of neural language models while faithfully attributing classic model decisions to training data. Among other use cases, such attribution enables model decisions to be contested based on responsible training instances. Compared to prior work, our approach achieves higher coverage and correctness in identifying which training data to remove to change a model decision. To reproduce findings, our source code is online at: https://github.com/SamSoup/WrapperBox.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes