Explaining Visual Models by Causal Attribution
This work addresses the need for more reliable explanations in visual models, though it appears incremental by building on causal methods.
The paper tackles the problem of unreliable feature effect estimation in model explanations by proposing a causal attribution approach based on counterfactuals, identifying limitations in current image generative models for this application.
Model explanations based on pure observational data cannot compute the effects of features reliably, due to their inability to estimate how each factor alteration could affect the rest. We argue that explanations should be based on the causal model of the data and the derived intervened causal models, that represent the data distribution subject to interventions. With these models, we can compute counterfactuals, new samples that will inform us how the model reacts to feature changes on our input. We propose a novel explanation methodology based on Causal Counterfactuals and identify the limitations of current Image Generative Models in their application to counterfactual creation.