Optimal and efficient text counterfactuals using Graph Neural Networks
This work addresses interpretability for users of NLP models in decision-making, though it appears incremental as it builds on existing counterfactual editing approaches.
The authors tackled the need for explainability in NLP models by proposing a framework for generating counterfactual interventions that change model predictions, achieving faster processing than state-of-the-art methods on sentiment and topic classification tasks.
As NLP models become increasingly integral to decision-making processes, the need for explainability and interpretability has become paramount. In this work, we propose a framework that achieves the aforementioned by generating semantically edited inputs, known as counterfactual interventions, which change the model prediction, thus providing a form of counterfactual explanations for the model. We test our framework on two NLP tasks - binary sentiment classification and topic classification - and show that the generated edits are contrastive, fluent and minimal, while the whole process remains significantly faster that other state-of-the-art counterfactual editors.