CLAICRLGApr 15, 2021

Gradient-based Adversarial Attacks against Text Transformers

arXiv:2104.13733v1714 citations
Originality Highly original
AI Analysis

This addresses security vulnerabilities in transformer models for natural language processing applications.

The authors tackled the problem of adversarial attacks against transformer models by proposing the first general-purpose gradient-based attack that searches for a distribution of adversarial examples, achieving state-of-the-art performance on various natural language tasks and matching or exceeding existing black-box transfer attacks with only hard-label outputs.

We propose the first general-purpose gradient-based attack against transformer models. Instead of searching for a single adversarial example, we search for a distribution of adversarial examples parameterized by a continuous-valued matrix, hence enabling gradient-based optimization. We empirically demonstrate that our white-box attack attains state-of-the-art attack performance on a variety of natural language tasks. Furthermore, we show that a powerful black-box transfer attack, enabled by sampling from the adversarial distribution, matches or exceeds existing methods, while only requiring hard-label outputs.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes