Context-aware Adversarial Attack on Named Entity Recognition
This work addresses robustness issues in NLP for security applications, but it is incremental as it builds on existing adversarial attack methods.
The paper tackles the vulnerability of pre-trained language models in named entity recognition by developing context-aware adversarial attacks that perturb informative words, achieving higher deception rates than strong baselines.
In recent years, large pre-trained language models (PLMs) have achieved remarkable performance on many natural language processing benchmarks. Despite their success, prior studies have shown that PLMs are vulnerable to attacks from adversarial examples. In this work, we focus on the named entity recognition task and study context-aware adversarial attack methods to examine the model's robustness. Specifically, we propose perturbing the most informative words for recognizing entities to create adversarial examples and investigate different candidate replacement methods to generate natural and plausible adversarial examples. Experiments and analyses show that our methods are more effective in deceiving the model into making wrong predictions than strong baselines.