Local Interpretable Model-agnostic Explanations of Bayesian Predictive Models via Kullback-Leibler Projections
This work addresses interpretability for users of Bayesian models, but it is incremental as it builds on existing LIME and projection methods.
The paper tackles the problem of explaining predictions from Bayesian predictive models by introducing KL-LIME, a method that projects predictive distributions locally to simpler, interpretable models, and demonstrates it on MNIST digit classifications with a Bayesian deep convolutional neural network.
We introduce a method, KL-LIME, for explaining predictions of Bayesian predictive models by projecting the information in the predictive distribution locally to a simpler, interpretable explanation model. The proposed approach combines the recent Local Interpretable Model-agnostic Explanations (LIME) method with ideas from Bayesian projection predictive variable selection methods. The information theoretic basis helps in navigating the trade-off between explanation fidelity and complexity. We demonstrate the method in explaining MNIST digit classifications made by a Bayesian deep convolutional neural network.