Local-to-Global Logical Explanations for Deep Vision Models
This addresses the opacity of deep neural networks for image classification, providing interpretable explanations for users in computer vision, though it is incremental as it builds on existing explanation methods.
The paper tackles the problem of interpreting deep vision models by introducing local and global explanation methods that generate logical formulas in monotone disjunctive-normal-form (MDNF) based on human-recognizable primitive concepts, showing high fidelity and coverage on challenging vision datasets.
While deep neural networks are extremely effective at classifying images, they remain opaque and hard to interpret. We introduce local and global explanation methods for black-box models that generate explanations in terms of human-recognizable primitive concepts. Both the local explanations for a single image and the global explanations for a set of images are cast as logical formulas in monotone disjunctive-normal-form (MDNF), whose satisfaction guarantees that the model yields a high score on a given class. We also present an algorithm for explaining the classification of examples into multiple classes in the form of a monotone explanation list over primitive concepts. Despite their simplicity and interpretability we show that the explanations maintain high fidelity and coverage with respect to the blackbox models they seek to explain in challenging vision datasets.