CLAIAug 31, 2021

Explaining Classes through Word Attribution

arXiv:2108.13653v1
Originality Incremental advance
AI Analysis

This addresses the need for class-level explanations in text classification, but it is incremental as it builds on existing feature attribution techniques.

The paper tackles the problem of explaining how deep learning models view entire classes in text classification by aggregating individual prediction explanations, and it demonstrates the method on Web register classification, finding that it identifies plausible and discriminative keywords for most classes.

In recent years, several methods have been proposed for explaining individual predictions of deep learning models, yet there has been little study of how to aggregate these predictions to explain how such models view classes as a whole in text classification tasks. In this work, we propose a method for explaining classes using deep learning models and the Integrated Gradients feature attribution technique by aggregating explanations of individual examples in text classification to general descriptions of the classes. We demonstrate the approach on Web register (genre) classification using the XML-R model and the Corpus of Online Registers of English (CORE), finding that the method identifies plausible and discriminative keywords characterizing all but the smallest class.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes