A Multilingual Evaluation of NER Robustness to Adversarial Inputs
This addresses the issue of model robustness for NER applications in multilingual settings, though it is incremental as it extends existing adversarial evaluation methods to non-English languages.
The paper tackled the problem of evaluating and improving the robustness of Named Entity Recognition (NER) models to adversarial inputs across multiple languages, finding that models in English, German, and Hindi are not very robust, and that using adversarial data for augmentation or fine-tuning improves performance on both original and adversarial test sets.
Adversarial evaluations of language models typically focus on English alone. In this paper, we performed a multilingual evaluation of Named Entity Recognition (NER) in terms of its robustness to small perturbations in the input. Our results showed the NER models we explored across three languages (English, German and Hindi) are not very robust to such changes, as indicated by the fluctuations in the overall F1 score as well as in a more fine-grained evaluation. With that knowledge, we further explored whether it is possible to improve the existing NER models using a part of the generated adversarial data sets as augmented training data to train a new NER model or as fine-tuning data to adapt an existing NER model. Our results showed that both these approaches improve performance on the original as well as adversarial test sets. While there is no significant difference between the two approaches for English, re-training is significantly better than fine-tuning for German and Hindi.