LG AI CRFeb 25, 2023

Scalable Attribution of Adversarial Attacks via Multi-Task Learning

Zhongyi Guo, Keji Han, Yao Ge, Wei Ji, Yun Li

arXiv:2302.14059v15.32 citationsh-index: 16

Originality Incremental advance

AI Analysis

This addresses the need for defenders to trace adversarial attack origins and optimize defenses, though it is incremental as it builds on prior work by incorporating multi-task learning for improved scalability.

The paper tackles the Adversarial Attribution Problem (AAP) by recognizing tool-chains behind adversarial examples, proposing a multi-task learning framework (MTAA) that simultaneously identifies attack algorithm, victim model, and hyperparameter, with experimental results on MNIST and ImageNet demonstrating feasibility, scalability, and effectiveness in reducing false alarms.

Deep neural networks (DNNs) can be easily fooled by adversarial attacks during inference phase when attackers add imperceptible perturbations to original examples, i.e., adversarial examples. Many works focus on adversarial detection and adversarial training to defend against adversarial attacks. However, few works explore the tool-chains behind adversarial examples, which can help defenders to seize the clues about the originator of the attack, their goals, and provide insight into the most effective defense algorithm against corresponding attacks. With such a gap, it is necessary to develop techniques that can recognize tool-chains that are leveraged to generate the adversarial examples, which is called Adversarial Attribution Problem (AAP). In this paper, AAP is defined as the recognition of three signatures, i.e., {\em attack algorithm}, {\em victim model} and {\em hyperparameter}. Current works transfer AAP into single label classification task and ignore the relationship between these signatures. The former will meet combination explosion problem as the number of signatures is increasing. The latter dictates that we cannot treat AAP simply as a single task problem. We first conduct some experiments to validate the attributability of adversarial examples. Furthermore, we propose a multi-task learning framework named Multi-Task Adversarial Attribution (MTAA) to recognize the three signatures simultaneously. MTAA contains perturbation extraction module, adversarial-only extraction module and classification and regression module. It takes the relationship between attack algorithm and corresponding hyperparameter into account and uses the uncertainty weighted loss to adjust the weights of three recognition tasks. The experimental results on MNIST and ImageNet show the feasibility and scalability of the proposed framework as well as its effectiveness in dealing with false alarms.

View on arXiv PDF

Similar