QUACKIE: A NLP Classification Task With Ground Truth Explanations
This work addresses the problem of biased human-provided ground truths for evaluating NLP interpretability methods, which is crucial for researchers developing and assessing these techniques.
The authors propose a new classification task derived from question-answering datasets where the interpretability ground truth is inherent to the problem definition, avoiding human bias. They use this to create a benchmark and evaluate various state-of-the-art NLP interpretability methods.
NLP Interpretability aims to increase trust in model predictions. This makes evaluating interpretability approaches a pressing issue. There are multiple datasets for evaluating NLP Interpretability, but their dependence on human provided ground truths raises questions about their unbiasedness. In this work, we take a different approach and formulate a specific classification task by diverting question-answering datasets. For this custom classification task, the interpretability ground-truth arises directly from the definition of the classification problem. We use this method to propose a benchmark and lay the groundwork for future research in NLP interpretability by evaluating a wide range of current state of the art methods.