CRAIMar 16, 2025

Defense Against Model Stealing Based on Account-Aware Distribution Discrepancy

arXiv:2503.12497v12 citationsh-index: 3AAAI
Originality Incremental advance
AI Analysis

This addresses security for commercial AI model providers by defending against low-cost replication attacks, though it is incremental as it builds on existing detection and poisoning techniques.

The paper tackles model-stealing attacks where malicious users replicate commercial models using query responses, proposing a defense method that achieves strong protection with minimal impact on benign users in image classification tasks.

Malicious users attempt to replicate commercial models functionally at low cost by training a clone model with query responses. It is challenging to timely prevent such model-stealing attacks to achieve strong protection and maintain utility. In this paper, we propose a novel non-parametric detector called Account-aware Distribution Discrepancy (ADD) to recognize queries from malicious users by leveraging account-wise local dependency. We formulate each class as a Multivariate Normal distribution (MVN) in the feature space and measure the malicious score as the sum of weighted class-wise distribution discrepancy. The ADD detector is combined with random-based prediction poisoning to yield a plug-and-play defense module named D-ADD for image classification models. Results of extensive experimental studies show that D-ADD achieves strong defense against different types of attacks with little interference in serving benign users for both soft and hard-label settings.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes