GTCRLGMay 8

Differentially Private Auditing Under Strategic Response

arXiv:2605.0767434.9
AI Analysis

For regulators auditing AI systems, this work provides a framework to design privacy-preserving audits that account for strategic developer behavior, improving detection of residual harm.

This paper studies audit design for AI systems when developers strategically respond to privacy-constrained audits, showing that naive DP auditing leads to larger welfare-weighted under-detection gaps. It proposes a bilevel game formulation and an algorithm (SPAD) to optimize audit allocation.

Regulatory audits of AI systems increasingly rely on differential privacy (DP) to protect training data and model internals. We study audit design when the audited developer can strategically respond to the privacy-constrained audit interface. We formalize privacy-constrained auditing as a bilevel Stackelberg game, in which an auditor commits to a query policy and DP budget allocation across harm dimensions, and a strategic developer reallocates mitigation efforts in response. We introduce the welfare-weighted under-detection gap $B_w$, the welfare-weighted true residual harm the audit fails to detect at the developer's strategic best response, and prove that naive DP auditing (uniform or harm-proportional allocation) induces a strictly larger $B_w$ than any non-strategic mitigation baseline whenever effective detectability is heterogeneous, the welfare weights are not comonotone with detectability, and the developer's optimum is interior. We characterize the optimal auditor allocation as a four-factor balance of welfare weight, audit miss-probability, detectability elasticity, and mitigation-cost curvature, and provide a single-level reformulation of the bilevel problem via the developer's KKT system. We propose Strategic Private Audit Design (SPAD), a projected-gradient algorithm with hypergradients computed through the developer's best response.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes