Towards Faithful Explanations: Boosting Rationalization with Shortcuts Discovery
This work addresses the issue of unfaithful explanations in AI for users relying on interpretable models, though it is incremental as it builds on existing rationalization methods.
The paper tackles the problem of neural networks using shortcuts in data to compose rationales for selective rationalization, proposing a method that discovers and exploits shortcuts to improve faithfulness, with experimental results validating its effectiveness on real-world datasets.
The remarkable success in neural networks provokes the selective rationalization. It explains the prediction results by identifying a small subset of the inputs sufficient to support them. Since existing methods still suffer from adopting the shortcuts in data to compose rationales and limited large-scale annotated rationales by human, in this paper, we propose a Shortcuts-fused Selective Rationalization (SSR) method, which boosts the rationalization by discovering and exploiting potential shortcuts. Specifically, SSR first designs a shortcuts discovery approach to detect several potential shortcuts. Then, by introducing the identified shortcuts, we propose two strategies to mitigate the problem of utilizing shortcuts to compose rationales. Finally, we develop two data augmentations methods to close the gap in the number of annotated rationales. Extensive experimental results on real-world datasets clearly validate the effectiveness of our proposed method.