Fault Injection and Safe-Error Attack for Extraction of Embedded Neural Network Models
This addresses security threats for IoT devices by demonstrating a physical attack that efficiently extracts models, though it is incremental as it applies an existing fault injection method to a new context.
The paper tackled model extraction from embedded neural networks on 32-bit microcontrollers using a Safe Error Attack, recovering at least 90% of the most significant bits with about 1500 crafted inputs and enabling a substitute model trained with only 8% of the dataset to achieve near-identical accuracy.
Model extraction emerges as a critical security threat with attack vectors exploiting both algorithmic and implementation-based approaches. The main goal of an attacker is to steal as much information as possible about a protected victim model, so that he can mimic it with a substitute model, even with a limited access to similar training data. Recently, physical attacks such as fault injection have shown worrying efficiency against the integrity and confidentiality of embedded models. We focus on embedded deep neural network models on 32-bit microcontrollers, a widespread family of hardware platforms in IoT, and the use of a standard fault injection strategy - Safe Error Attack (SEA) - to perform a model extraction attack with an adversary having a limited access to training data. Since the attack strongly depends on the input queries, we propose a black-box approach to craft a successful attack set. For a classical convolutional neural network, we successfully recover at least 90% of the most significant bits with about 1500 crafted inputs. These information enable to efficiently train a substitute model, with only 8% of the training dataset, that reaches high fidelity and near identical accuracy level than the victim model.