CVAILGApr 28, 2024

Position: Do Not Explain Vision Models Without Context

arXiv:2404.18316v32 citationsh-index: 35ICML
Originality Synthesis-oriented
AI Analysis

This work highlights a critical flaw in explainable AI for computer vision, which could impact users relying on model interpretations in real-world applications, though it is incremental as it builds on existing XAI methods.

The paper identifies that current explanation methods for vision models fail to incorporate contextual information, leading to failures in interpreting relationships like whether a stethoscope indicates a doctor or patient, and proposes new research directions to address this issue.

Does the stethoscope in the picture make the adjacent person a doctor or a patient? This, of course, depends on the contextual relationship of the two objects. If it's obvious, why don't explanation methods for vision models use contextual information? In this paper, we (1) review the most popular methods of explaining computer vision models by pointing out that they do not take into account context information, (2) show examples of failures of popular XAI methods, (3) provide examples of real-world use cases where spatial context plays a significant role, (4) propose new research directions that may lead to better use of context information in explaining computer vision models, (5) argue that a change in approach to explanations is needed from 'where' to 'how'.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes