LGApr 16, 2023

Explanations of Black-Box Models based on Directional Feature Interactions

Aria Masoomi, Davin Hill, Zhonghui Xu, Craig P Hersh, Edwin K. Silverman, Peter J. Castaldi, Stratis Ioannidis, Jennifer Dy

arXiv:2304.07670v118.826 citationsh-index: 135Has Code

Originality Incremental advance

AI Analysis

This work addresses the need for more transparent AI by providing enhanced explainability for black-box models, though it is incremental as it builds on existing Shapley value explanations.

The paper tackles the problem of explaining black-box machine learning models by extending univariate feature importance methods to bivariate ones that capture directional feature interactions, represented as a directed graph, and demonstrates its superiority over state-of-the-art methods on datasets like CIFAR10 and IMDB.

As machine learning algorithms are deployed ubiquitously to a variety of domains, it is imperative to make these often black-box models transparent. Several recent works explain black-box models by capturing the most influential features for prediction per instance; such explanation methods are univariate, as they characterize importance per feature. We extend univariate explanation to a higher-order; this enhances explainability, as bivariate methods can capture feature interactions in black-box models, represented as a directed graph. Analyzing this graph enables us to discover groups of features that are equally important (i.e., interchangeable), while the notion of directionality allows us to identify the most influential features. We apply our bivariate method on Shapley value explanations, and experimentally demonstrate the ability of directional explanations to discover feature interactions. We show the superiority of our method against state-of-the-art on CIFAR10, IMDB, Census, Divorce, Drug, and gene data.

View on arXiv PDF Code

Similar