CRAICVLGJan 23, 2022

Increasing the Cost of Model Extraction with Calibrated Proof of Work

arXiv:2201.09243v337 citations
AI Analysis

This addresses the problem of model stealing for machine learning practitioners by providing a defense that avoids tradeoffs with model utility, though it is incremental as it builds on existing proof-of-work and differential privacy concepts.

The paper tackles model extraction attacks by requiring users to complete a calibrated proof-of-work before accessing predictions, increasing attacker effort up to 100x while adding only up to 2x overhead for regular users.

In model extraction attacks, adversaries can steal a machine learning model exposed via a public API by repeatedly querying it and adjusting their own model based on obtained predictions. To prevent model stealing, existing defenses focus on detecting malicious queries, truncating, or distorting outputs, thus necessarily introducing a tradeoff between robustness and model utility for legitimate users. Instead, we propose to impede model extraction by requiring users to complete a proof-of-work before they can read the model's predictions. This deters attackers by greatly increasing (even up to 100x) the computational effort needed to leverage query access for model extraction. Since we calibrate the effort required to complete the proof-of-work to each query, this only introduces a slight overhead for regular users (up to 2x). To achieve this, our calibration applies tools from differential privacy to measure the information revealed by a query. Our method requires no modification of the victim model and can be applied by machine learning practitioners to guard their publicly exposed models against being easily stolen.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes