On the interplay of Explainability, Privacy and Predictive Performance with Explanation-assisted Model Extraction
This work addresses a critical security and privacy problem for MLaaS providers and users, focusing on the incremental challenge of integrating XAI with privacy protections.
The paper tackles the privacy risks of model extraction attacks (MEA) in Machine Learning as a Service (MLaaS) platforms, particularly when attackers exploit counterfactual explanations (CFs) from explainable AI (XAI), and investigates the trade-offs among predictive performance, privacy, and explainability by evaluating two Differential Privacy (DP) strategies—one applied during model training and another during CF generation—to mitigate these attacks.
Machine Learning as a Service (MLaaS) has gained important attraction as a means for deploying powerful predictive models, offering ease of use that enables organizations to leverage advanced analytics without substantial investments in specialized infrastructure or expertise. However, MLaaS platforms must be safeguarded against security and privacy attacks, such as model extraction (MEA) attacks. The increasing integration of explainable AI (XAI) within MLaaS has introduced an additional privacy challenge, as attackers can exploit model explanations particularly counterfactual explanations (CFs) to facilitate MEA. In this paper, we investigate the trade offs among model performance, privacy, and explainability when employing Differential Privacy (DP), a promising technique for mitigating CF facilitated MEA. We evaluate two distinct DP strategies: implemented during the classification model training and at the explainer during CF generation.