AIMay 12, 2022

Can counterfactual explanations of AI systems' predictions skew lay users' causal intuitions about the world? If so, can we correct for that?

Cambridge
arXiv:2205.06241v27 citationsh-index: 42
AI Analysis

This addresses a critical problem in explainable AI by revealing how explanations can inadvertently mislead users about causality, with implications for transparency and trust in AI systems, though it is incremental in building on prior cognitive science findings.

The study investigated whether counterfactual explanations of AI predictions can mislead lay users into believing that the AI's features reflect real-world causal relationships, and found that such explanations indeed skew causal intuitions (Experiment 1, N=364). It also tested a correction method based on warning messages, showing that clarifying that AI captures correlations, not causality, can mitigate this effect (Experiment 2).

Counterfactual (CF) explanations have been employed as one of the modes of explainability in explainable AI-both to increase the transparency of AI systems and to provide recourse. Cognitive science and psychology, however, have pointed out that people regularly use CFs to express causal relationships. Most AI systems are only able to capture associations or correlations in data so interpreting them as casual would not be justified. In this paper, we present two experiment (total N = 364) exploring the effects of CF explanations of AI system's predictions on lay people's causal beliefs about the real world. In Experiment 1 we found that providing CF explanations of an AI system's predictions does indeed (unjustifiably) affect people's causal beliefs regarding factors/features the AI uses and that people are more likely to view them as causal factors in the real world. Inspired by the literature on misinformation and health warning messaging, Experiment 2 tested whether we can correct for the unjustified change in causal beliefs. We found that pointing out that AI systems capture correlations and not necessarily causal relationships can attenuate the effects of CF explanations on people's causal beliefs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes