CRAIMay 13, 2025

On the interplay of Explainability, Privacy and Predictive Performance with Explanation-assisted Model Extraction

arXiv:2505.08847v1h-index: 17xAI
Originality Incremental advance
AI Analysis

This work addresses a critical security and privacy problem for MLaaS providers and users, focusing on the incremental challenge of integrating XAI with privacy protections.

The paper tackles the privacy risks of model extraction attacks (MEA) in Machine Learning as a Service (MLaaS) platforms, particularly when attackers exploit counterfactual explanations (CFs) from explainable AI (XAI), and investigates the trade-offs among predictive performance, privacy, and explainability by evaluating two Differential Privacy (DP) strategies—one applied during model training and another during CF generation—to mitigate these attacks.

Machine Learning as a Service (MLaaS) has gained important attraction as a means for deploying powerful predictive models, offering ease of use that enables organizations to leverage advanced analytics without substantial investments in specialized infrastructure or expertise. However, MLaaS platforms must be safeguarded against security and privacy attacks, such as model extraction (MEA) attacks. The increasing integration of explainable AI (XAI) within MLaaS has introduced an additional privacy challenge, as attackers can exploit model explanations particularly counterfactual explanations (CFs) to facilitate MEA. In this paper, we investigate the trade offs among model performance, privacy, and explainability when employing Differential Privacy (DP), a promising technique for mitigating CF facilitated MEA. We evaluate two distinct DP strategies: implemented during the classification model training and at the explainer during CF generation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes