Rigorous Feature Importance Scores based on Shapley Value and Banzhaf Index
This work provides a more rigorous feature attribution method for high-stakes machine learning applications, though it is incremental as it builds on existing game-theoretic approaches.
The paper tackled the problem of feature attribution in explainable AI by addressing the neglect of non-weak abductive explanation sets, which can convey important information related to adversarial examples. It introduced two novel feature importance scores based on Shapley value and Banzhaf index, quantifying feature effectiveness in excluding adversarial examples, and analyzed their properties and computational complexity.
Feature attribution methods based on game theory are ubiquitous in the field of eXplainable Artificial Intelligence (XAI). Recent works proposed rigorous feature attribution using logic-based explanations, specifically targeting high-stakes uses of machine learning (ML) models. Typically, such works exploit weak abductive explanation (WAXp) as the characteristic function to assign importance to features. However, one possible downside is that the contribution of non-WAXp sets is neglected. In fact, non-WAXp sets can also convey important information, because of the relationship between formal explanations (XPs) and adversarial examples (AExs). Accordingly, this paper leverages Shapley value and Banzhaf index to devise two novel feature importance scores. We take into account non-WAXp sets when computing feature contribution, and the novel scores quantify how effective each feature is at excluding AExs. Furthermore, the paper identifies properties and studies the computational complexity of the proposed scores.