AI CL LGNov 7, 2024

DISCO: DISCovering Overfittings as Causal Rules for Text Classification Models

Zijian Zhang, Vinay Setty, Yumeng Wang, Avishek Anand

arXiv:2411.04649v12.3h-index: 14MAI-XAI@ECAI

Originality Incremental advance

AI Analysis

This addresses the need for interpretable explanations in text classification to help human inspectors identify overfitting and spurious correlations, though it is incremental as it builds on existing interpretability methods.

The paper tackles the problem of interpreting over-parameterized neural language models by introducing DISCO, a method that discovers global, rule-based explanations through causal n-gram associations, achieving a 100% detection rate for shortcuts and an 18.8% regression in model performance.

With the rapid advancement of neural language models, the deployment of over-parameterized models has surged, increasing the need for interpretable explanations comprehensible to human inspectors. Existing post-hoc interpretability methods, which often focus on unigram features of single input textual instances, fail to capture the models' decision-making process fully. Additionally, many methods do not differentiate between decisions based on spurious correlations and those based on a holistic understanding of the input. Our paper introduces DISCO, a novel method for discovering global, rule-based explanations by identifying causal n-gram associations with model predictions. This method employs a scalable sequence mining technique to extract relevant text spans from training data, associate them with model predictions, and conduct causality checks to distill robust rules that elucidate model behavior. These rules expose potential overfitting and provide insights into misleading feature combinations. We validate DISCO through extensive testing, demonstrating its superiority over existing methods in offering comprehensive insights into complex model behaviors. Our approach successfully identifies all shortcuts manually introduced into the training data (100% detection rate on the MultiRC dataset), resulting in an 18.8% regression in model performance -- a capability unmatched by any other method. Furthermore, DISCO supports interactive explanations, enabling human inspectors to distinguish spurious causes in the rule-based output. This alleviates the burden of abundant instance-wise explanations and helps assess the model's risk when encountering out-of-distribution (OOD) data.

View on arXiv PDF

Similar