LGSep 28, 2023

On the Trade-offs between Adversarial Robustness and Actionable Explanations

Harvard
arXiv:2309.16452v21 citationsh-index: 43
Originality Incremental advance
AI Analysis

This addresses a critical issue for deploying ML in high-stakes settings, revealing inherent trade-offs between robustness and explainability, though it is incremental as it builds on existing methods.

The paper tackles the problem of whether adversarially robust machine learning models can simultaneously provide actionable explanations for recourse, finding that robust models significantly increase the cost and reduce the validity of recourses, with theoretical bounds and empirical validation on real-world datasets.

As machine learning models are increasingly being employed in various high-stakes settings, it becomes important to ensure that predictions of these models are not only adversarially robust, but also readily explainable to relevant stakeholders. However, it is unclear if these two notions can be simultaneously achieved or if there exist trade-offs between them. In this work, we make one of the first attempts at studying the impact of adversarially robust models on actionable explanations which provide end users with a means for recourse. We theoretically and empirically analyze the cost (ease of implementation) and validity (probability of obtaining a positive model prediction) of recourses output by state-of-the-art algorithms when the underlying models are adversarially robust vs. non-robust. More specifically, we derive theoretical bounds on the differences between the cost and the validity of the recourses generated by state-of-the-art algorithms for adversarially robust vs. non-robust linear and non-linear models. Our empirical results with multiple real-world datasets validate our theoretical results and show the impact of varying degrees of model robustness on the cost and validity of the resulting recourses. Our analyses demonstrate that adversarially robust models significantly increase the cost and reduce the validity of the resulting recourses, thus shedding light on the inherent trade-offs between adversarial robustness and actionable explanations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes