CLCRLGNov 5, 2022

Textual Manifold-based Defense Against Natural Language Adversarial Examples

MIT
arXiv:2211.02878v1302 citationsh-index: 34Has Code
Originality Highly original
AI Analysis

This addresses the problem of adversarial robustness in NLP models for researchers and practitioners, representing a novel application of manifold-based defense from computer vision to NLP.

The authors tackled the problem of defending against adversarial attacks in natural language processing by proposing a method that projects text embeddings onto an approximated manifold, which consistently and significantly outperforms previous defenses without sacrificing clean accuracy.

Recent studies on adversarial images have shown that they tend to leave the underlying low-dimensional data manifold, making them significantly more challenging for current models to make correct predictions. This so-called off-manifold conjecture has inspired a novel line of defenses against adversarial attacks on images. In this study, we find a similar phenomenon occurs in the contextualized embedding space induced by pretrained language models, in which adversarial texts tend to have their embeddings diverge from the manifold of natural ones. Based on this finding, we propose Textual Manifold-based Defense (TMD), a defense mechanism that projects text embeddings onto an approximated embedding manifold before classification. It reduces the complexity of potential adversarial examples, which ultimately enhances the robustness of the protected model. Through extensive experiments, our method consistently and significantly outperforms previous defenses under various attack settings without trading off clean accuracy. To the best of our knowledge, this is the first NLP defense that leverages the manifold structure against adversarial attacks. Our code is available at \url{https://github.com/dangne/tmd}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes