PINCH: An Adversarial Extraction Attack Framework for Deep Learning Models
This work addresses the threat of model theft for deep learning practitioners, but it is incremental as it builds on existing extraction attack literature with a new framework for analysis.
The paper tackles the problem of adversarial extraction attacks on deep learning models by presenting PINCH, an automated framework for designing and analyzing such attacks, and finds through experiments on 21 model architectures that certain configurations are resilient, partial extraction enables further attacks, and stolen models have similar knowledge despite differences in expressive power.
Adversarial extraction attacks constitute an insidious threat against Deep Learning (DL) models in-which an adversary aims to steal the architecture, parameters, and hyper-parameters of a targeted DL model. Existing extraction attack literature have observed varying levels of attack success for different DL models and datasets, yet the underlying cause(s) behind their susceptibility often remain unclear, and would help facilitate creating secure DL systems. In this paper we present PINCH: an efficient and automated extraction attack framework capable of designing, deploying, and analyzing extraction attack scenarios across heterogeneous hardware platforms. Using PINCH, we perform extensive experimental evaluation of extraction attacks against 21 model architectures to explore new extraction attack scenarios and further attack staging. Our findings show (1) key extraction characteristics whereby particular model configurations exhibit strong resilience against specific attacks, (2) even partial extraction success enables further staging for other adversarial attacks, and (3) equivalent stolen models uncover differences in expressive power, yet exhibit similar captured knowledge.