DivQAT: Enhancing Robustness of Quantized Convolutional Neural Networks against Model Extraction Attacks
This addresses IP theft risks for quantized models in edge devices, offering a novel integration of defense into training, though it is incremental as it builds on existing quantization-aware training.
The paper tackles the vulnerability of quantized convolutional neural networks to model extraction attacks by proposing DivQAT, a novel algorithm that integrates defense into the quantization-aware training process, demonstrating efficacy in defending against attacks without compromising accuracy on benchmark vision datasets.
Convolutional Neural Networks (CNNs) and their quantized counterparts are vulnerable to extraction attacks, posing a significant threat of IP theft. Yet, the robustness of quantized models against these attacks is little studied compared to large models. Previous defenses propose to inject calculated noise into the prediction probabilities. However, these defenses are limited since they are not incorporated during the model design and are only added as an afterthought after training. Additionally, most defense techniques are computationally expensive and often have unrealistic assumptions about the victim model that are not feasible in edge device implementations and do not apply to quantized models. In this paper, we propose DivQAT, a novel algorithm to train quantized CNNs based on Quantization Aware Training (QAT) aiming to enhance their robustness against extraction attacks. To the best of our knowledge, our technique is the first to modify the quantization process to integrate a model extraction defense into the training process. Through empirical validation on benchmark vision datasets, we demonstrate the efficacy of our technique in defending against model extraction attacks without compromising model accuracy. Furthermore, combining our quantization technique with other defense mechanisms improves their effectiveness compared to traditional QAT.