LGJan 4

DiMEx: Breaking the Cold Start Barrier in Data-Free Model Extraction via Latent Diffusion Priors

arXiv:2601.01688v1
Originality Highly original
AI Analysis

This addresses a stealthy threat to ML-as-a-Service by improving model stealing efficiency, though it is incremental as it builds on existing DFME methods.

The paper tackles the cold start problem in data-free model extraction by using latent diffusion priors to generate high-fidelity queries, achieving 52.1% agreement on SVHN with 2,000 queries and outperforming GAN baselines by over 16%. It also proposes a defense that reduces attack success rates to 21.6%.

Model stealing attacks pose an existential threat to Machine Learning as a Service (MLaaS), allowing adversaries to replicate proprietary models for a fraction of their training cost. While Data-Free Model Extraction (DFME) has emerged as a stealthy vector, it remains fundamentally constrained by the "Cold Start" problem: GAN-based adversaries waste thousands of queries converging from random noise to meaningful data. We propose DiMEx, a framework that weaponizes the rich semantic priors of pre-trained Latent Diffusion Models to bypass this initialization barrier entirely. By employing Random Embedding Bayesian Optimization (REMBO) within the generator's latent space, DiMEx synthesizes high-fidelity queries immediately, achieving 52.1 percent agreement on SVHN with just 2,000 queries - outperforming state-of-the-art GAN baselines by over 16 percent. To counter this highly semantic threat, we introduce the Hybrid Stateful Ensemble (HSE) defense, which identifies the unique "optimization trajectory" of latent-space attacks. Our results demonstrate that while DiMEx evades static distribution detectors, HSE exploits this temporal signature to suppress attack success rates to 21.6 percent with negligible latency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes