A Model Stealing Attack Against Multi-Exit Networks
This addresses a security vulnerability for users of multi-exit networks, enabling more effective model extraction that preserves computational efficiency, though it is incremental as it builds on existing model stealing attacks.
The paper tackles the problem of model stealing attacks on multi-exit networks, which previously failed to capture the output strategy, and proposes a method that extracts both model utility and output strategy, achieving accuracy and efficiency closest to victim models in experiments.
Compared to traditional neural networks with a single output channel, a multi-exit network has multiple exits that allow for early outputs from the model's intermediate layers, thus significantly improving computational efficiency while maintaining similar main task accuracy. Existing model stealing attacks can only steal the model's utility while failing to capture its output strategy, i.e., a set of thresholds used to determine from which exit to output. This leads to a significant decrease in computational efficiency for the extracted model, thereby losing the advantage of multi-exit networks. In this paper, we propose the first model stealing attack against multi-exit networks to extract both the model utility and the output strategy. We employ Kernel Density Estimation to analyze the target model's output strategy and use performance loss and strategy loss to guide the training of the extracted model. Furthermore, we design a novel output strategy search algorithm to maximize the consistency between the victim model and the extracted model's output behaviors. In experiments across multiple multi-exit networks and benchmark datasets, our method always achieves accuracy and efficiency closest to the victim models.