CLJun 11, 2022

Improving the Adversarial Robustness of NLP Models by Information Bottleneck

arXiv:2206.05511v1646 citationsh-index: 84
Originality Incremental advance
AI Analysis

This work addresses adversarial robustness for NLP models, offering a novel defense method that outperforms previous approaches, though it is incremental in applying information bottleneck theory to this specific domain.

The paper tackled the problem of adversarial examples in NLP models by using information bottleneck theory to capture robust features and eliminate non-robust ones, resulting in significant improvements in robust accuracy without performance drops in clean accuracy on SST-2, AGNEWS, and IMDB datasets.

Existing studies have demonstrated that adversarial examples can be directly attributed to the presence of non-robust features, which are highly predictive, but can be easily manipulated by adversaries to fool NLP models. In this study, we explore the feasibility of capturing task-specific robust features, while eliminating the non-robust ones by using the information bottleneck theory. Through extensive experiments, we show that the models trained with our information bottleneck-based method are able to achieve a significant improvement in robust accuracy, exceeding performances of all the previously reported defense methods while suffering almost no performance drop in clean accuracy on SST-2, AGNEWS and IMDB datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes