CLJun 8, 2022

Adversarial Text Normalization

Joanna Bitton, Maya Pavlova, Ivan Evtimov

arXiv:2206.04137v131.7628 citationsh-index: 14

Originality Incremental advance

AI Analysis

This addresses the need for efficient defenses against accessible adversarial attacks for users and models, though it is incremental as it supplements existing retraining solutions.

The paper tackles the problem of character-level adversarial attacks on text models by proposing the Adversarial Text Normalizer, a lightweight method that restores baseline performance with low computational overhead, as demonstrated on Hate Speech and Natural Language Inference tasks.

Text-based adversarial attacks are becoming more commonplace and accessible to general internet users. As these attacks proliferate, the need to address the gap in model robustness becomes imminent. While retraining on adversarial data may increase performance, there remains an additional class of character-level attacks on which these models falter. Additionally, the process to retrain a model is time and resource intensive, creating a need for a lightweight, reusable defense. In this work, we propose the Adversarial Text Normalizer, a novel method that restores baseline performance on attacked content with low computational overhead. We evaluate the efficacy of the normalizer on two problem areas prone to adversarial attacks, i.e. Hate Speech and Natural Language Inference. We find that text normalization provides a task-agnostic defense against character-level attacks that can be implemented supplementary to adversarial retraining solutions, which are more suited for semantic alterations.

View on arXiv PDF

Similar