Causal Identification of Sufficient, Contrastive and Complete Feature Sets in Image Classification
This work addresses the problem of providing rigorous and practical explanations for image classification outputs, which is crucial for improving interpretability and trust in AI systems, though it builds incrementally on existing causal and logic-based approaches.
The paper tackled the lack of formal rigor in existing explanation methods for image classifiers by proposing causal explanations that are formally defined, computable via black-box algorithms, and include contrastive and complete variants. The result is an efficient implementation that computes all explanation types in about 6 seconds per image on a ResNet50 model, without requiring model internals or specific properties.
Existing algorithms for explaining the outputs of image classifiers are based on a variety of approaches and produce explanations that lack formal rigor. On the other hand, logic-based explanations are formally and rigorously defined but their computability relies on strict assumptions about the model that do not hold on image classifiers. In this paper, we show that causal explanations, in addition to being formally and rigorously defined, enjoy the same formal properties as logic-based ones, while still lending themselves to black-box algorithms and being a natural fit for image classifiers. We prove formal properties of causal explanations and introduce contrastive causal explanations for image classifiers. Moreover, we augment the definition of explanation with confidence awareness and introduce complete causal explanations: explanations that are classified with exactly the same confidence as the original image. We implement our definitions, and our experimental results demonstrate that different models have different patterns of sufficiency, contrastiveness, and completeness. Our algorithms are efficiently computable, taking on average 6s per image on a ResNet50 model to compute all types of explanations, and are totally black-box, needing no knowledge of the model, no access to model internals, no access to gradient, nor requiring any properties, such as monotonicity, of the model.