LGCRCVMLJun 26, 2019

Prediction Poisoning: Towards Defenses Against DNN Model Stealing Attacks

arXiv:1906.10908v2195 citations
Originality Highly original
AI Analysis

This addresses the threat to business models of cloud prediction APIs from model functionality stealing attacks, representing a novel active defense approach rather than an incremental improvement.

The paper tackles the problem of model stealing attacks on deep neural networks via black-box access by proposing a defense that actively perturbs predictions to poison the attacker's training objective, resulting in amplifying the attacker's error rate up to 85 times with minimal impact on utility for benign users.

High-performance Deep Neural Networks (DNNs) are increasingly deployed in many real-world applications e.g., cloud prediction APIs. Recent advances in model functionality stealing attacks via black-box access (i.e., inputs in, predictions out) threaten the business model of such applications, which require a lot of time, money, and effort to develop. Existing defenses take a passive role against stealing attacks, such as by truncating predicted information. We find such passive defenses ineffective against DNN stealing attacks. In this paper, we propose the first defense which actively perturbs predictions targeted at poisoning the training objective of the attacker. We find our defense effective across a wide range of challenging datasets and DNN model stealing attacks, and additionally outperforms existing defenses. Our defense is the first that can withstand highly accurate model stealing attacks for tens of thousands of queries, amplifying the attacker's error rate up to a factor of 85$\times$ with minimal impact on the utility for benign users.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes