CL CR LGApr 25, 2022

Can Rationalization Improve Robustness?

Howard Chen, Jacqueline He, Karthik Narasimhan, Danqi Chen

Princeton

arXiv:2204.11790v232.4652 citationsh-index: 55Has Code

Originality Incremental advance

AI Analysis

This addresses the problem of adversarial robustness in NLP for researchers and practitioners, but it is incremental as it builds on existing rationale models to explore a new application.

The paper investigates whether rationale models, which produce interpretable subsets of input to explain predictions, can improve robustness against adversarial attacks by masking out noise or attack text. Experiments across five tasks show these models offer promise for robustness but struggle with positional bias and lexical sensitivity, with human supervision not consistently enhancing performance.

A growing line of work has investigated the development of neural NLP models that can produce rationales--subsets of input that can explain their model predictions. In this paper, we ask whether such rationale models can also provide robustness to adversarial attacks in addition to their interpretable nature. Since these models need to first generate rationales ("rationalizer") before making predictions ("predictor"), they have the potential to ignore noise or adversarially added text by simply masking it out of the generated rationale. To this end, we systematically generate various types of 'AddText' attacks for both token and sentence-level rationalization tasks, and perform an extensive empirical evaluation of state-of-the-art rationale models across five different tasks. Our experiments reveal that the rationale models show the promise to improve robustness, while they struggle in certain scenarios--when the rationalizer is sensitive to positional bias or lexical choices of attack text. Further, leveraging human rationale as supervision does not always translate to better performance. Our study is a first step towards exploring the interplay between interpretability and robustness in the rationalize-then-predict framework.

View on arXiv PDF Code

Similar