LG CRAug 23, 2023

SEA: Shareable and Explainable Attribution for Query-based Black-box Attacks

DeepMind

arXiv:2308.11845v22.0h-index: 26

Originality Incremental advance

AI Analysis

This addresses the need for forensic analysis and threat intelligence sharing in ML security, offering a novel approach to characterize and explain attacks, though it is incremental in building on existing attribution methods.

The paper tackles the problem of profiling and sharing information about query-based black-box attacks on ML systems by introducing SEA, a system that uses Hidden Markov Models for attack attribution, achieving over 90% Top-1 and 95% Top-3 accuracy in recognizing attacks and identifying specific bugs in attack libraries.

Machine Learning (ML) systems are vulnerable to adversarial examples, particularly those from query-based black-box attacks. Despite various efforts to detect and prevent such attacks, ML systems are still at risk, demanding a more comprehensive approach to security that includes logging, analyzing, and sharing evidence. While traditional security benefits from well-established practices of forensics and threat intelligence sharing, ML security has yet to find a way to profile its attackers and share information about them. In response, this paper introduces SEA, a novel ML security system to characterize black-box attacks on ML systems for forensic purposes and to facilitate human-explainable intelligence sharing. SEA leverages Hidden Markov Models to attribute the observed query sequence to known attacks. It thus understands the attack's progression rather than focusing solely on the final adversarial examples. Our evaluations reveal that SEA is effective at attack attribution, even on the second incident, and is robust to adaptive strategies designed to evade forensic analysis. SEA's explanations of the attack's behavior allow us even to fingerprint specific minor bugs in widely used attack libraries. For example, we discover that the SignOPT and Square attacks in ART v1.14 send over 50% duplicated queries. We thoroughly evaluate SEA on a variety of settings and demonstrate that it can recognize the same attack with more than 90% Top-1 and 95% Top-3 accuracy. Finally, we demonstrate how SEA generalizes to other domains like text classification.

View on arXiv PDF

Similar