LGMay 15, 2022

Learn2Weight: Parameter Adaptation against Similar-domain Adversarial Attacks

arXiv:2205.07315v251.3580 citationsh-index: 7

Originality Incremental advance

AI Analysis

This work addresses machine learning safety by improving defense against adversarial attacks in NLP, but it is incremental as it builds on existing domain adaptation and adversarial defense concepts.

The paper tackles the problem of defending against black-box adversarial attacks in NLP by proposing Learn2Weight, a method that adapts model parameters to protect against similar-domain adversarial examples, showing effectiveness on Amazon sentiment datasets compared to standard defenses like adversarial training.

Recent work in black-box adversarial attacks for NLP systems has attracted much attention. Prior black-box attacks assume that attackers can observe output labels from target models based on selected inputs. In this work, inspired by adversarial transferability, we propose a new type of black-box NLP adversarial attack that an attacker can choose a similar domain and transfer the adversarial examples to the target domain and cause poor performance in target model. Based on domain adaptation theory, we then propose a defensive strategy, called Learn2Weight, which trains to predict the weight adjustments for a target model in order to defend against an attack of similar-domain adversarial examples. Using Amazon multi-domain sentiment classification datasets, we empirically show that Learn2Weight is effective against the attack compared to standard black-box defense methods such as adversarial training and defensive distillation. This work contributes to the growing literature on machine learning safety.

View on arXiv PDF

Similar