NormLime: A New Feature Importance Metric for Explaining Deep Neural Networks
This work addresses the need for better interpretability tools in deep learning, particularly for users requiring class-specific explanations, though it is incremental as it builds on existing LIME-based methods.
The paper tackles the problem of explaining deep neural networks by proposing NormLIME, a new feature importance metric for aggregating local models into global and class-specific interpretations, with a human user study strongly favoring it over other metrics and numerical experiments confirming its effectiveness.
The problem of explaining deep learning models, and model predictions generally, has attracted intensive interest recently. Many successful approaches forgo global approximations in order to provide more faithful local interpretations of the model's behavior. LIME develops multiple interpretable models, each approximating a large neural network on a small region of the data manifold and SP-LIME aggregates the local models to form a global interpretation. Extending this line of research, we propose a simple yet effective method, NormLIME for aggregating local models into global and class-specific interpretations. A human user study strongly favored class-specific interpretations created by NormLIME to other feature importance metrics. Numerical experiments confirm that NormLIME is effective at recognizing important features.