CLLGApr 7, 2020

Towards Evaluating the Robustness of Chinese BERT Classifiers

arXiv:2004.03742v19 citations
Originality Incremental advance
AI Analysis

This addresses a robustness problem for Chinese NLP applications, but it is incremental as it adapts existing adversarial attack methods to Chinese.

The paper tackles the vulnerability of Chinese BERT classifiers to character-level adversarial attacks, showing that classification accuracy drops from 91.8% to 0% by manipulating less than 2 characters on average.

Recent advances in large-scale language representation models such as BERT have improved the state-of-the-art performances in many NLP tasks. Meanwhile, character-level Chinese NLP models, including BERT for Chinese, have also demonstrated that they can outperform the existing models. In this paper, we show that, however, such BERT-based models are vulnerable under character-level adversarial attacks. We propose a novel Chinese char-level attack method against BERT-based classifiers. Essentially, we generate "small" perturbation on the character level in the embedding space and guide the character substitution procedure. Extensive experiments show that the classification accuracy on a Chinese news dataset drops from 91.8% to 0% by manipulating less than 2 characters on average based on the proposed attack. Human evaluations also confirm that our generated Chinese adversarial examples barely affect human performance on these NLP tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes