On quantitative aspects of model interpretability
This addresses the challenge for ML researchers and practitioners in objectively comparing explainability methods, though it is incremental as it builds on existing cognitive science concepts.
The paper tackles the problem of evaluating interpretability methods in machine learning by proposing a set of metrics to programmatically assess aspects like simplicity and broadness, and validates these metrics on benchmark tasks to guide practitioners in method selection.
Despite the growing body of work in interpretable machine learning, it remains unclear how to evaluate different explainability methods without resorting to qualitative assessment and user-studies. While interpretability is an inherently subjective matter, previous works in cognitive science and epistemology have shown that good explanations do possess aspects that can be objectively judged apart from fidelity), such assimplicity and broadness. In this paper we propose a set of metrics to programmatically evaluate interpretability methods along these dimensions. In particular, we argue that the performance of methods along these dimensions can be orthogonally imputed to two conceptual parts, namely the feature extractor and the actual explainability method. We experimentally validate our metrics on different benchmark tasks and show how they can be used to guide a practitioner in the selection of the most appropriate method for the task at hand.