LGCRMar 2, 2021

Evaluating the Robustness of Geometry-Aware Instance-Reweighted Adversarial Training

arXiv:2103.01914v223 citationsHas Code
AI Analysis

This work identifies a vulnerability in a recent state-of-the-art adversarial defense method, which is incremental as it critiques and analyzes an existing approach rather than proposing a new one.

The paper evaluates the adversarial robustness of Geometry-aware Instance-reweighted Adversarial Training (GAIRAT), finding that while it improves over regular adversarial training on CIFAR-10, it biases the model and becomes vulnerable to attacks scaling logits, reducing accuracy from 55% to 44% under a crafted PGD attack.

In this technical report, we evaluate the adversarial robustness of a very recent method called "Geometry-aware Instance-reweighted Adversarial Training"[7]. GAIRAT reports state-of-the-art results on defenses to adversarial attacks on the CIFAR-10 dataset. In fact, we find that a network trained with this method, while showing an improvement over regular adversarial training (AT), is biasing the model towards certain samples by re-scaling the loss. Indeed, this leads the model to be susceptible to attacks that scale the logits. The original model shows an accuracy of 59% under AutoAttack - when trained with additional data with pseudo-labels. We provide an analysis that shows the opposite. In particular, we craft a PGD attack multiplying the logits by a positive scalar that decreases the GAIRAT accuracy from from 55% to 44%, when trained solely on CIFAR-10. In this report, we rigorously evaluate the model and provide insights into the reasons behind the vulnerability of GAIRAT to this adversarial attack. The code to reproduce our evaluation is made available at https://github.com/giuxhub/GAIRAT-LSA

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes