Sanity checks for patch visualisation in prototype-based image classification
This work addresses a critical flaw in interpretability for prototype-based image classifiers, which is important for researchers and practitioners relying on these models for trustworthy AI, though it is incremental as it builds on existing methods.
The paper analyzed visualization methods in ProtoPNet and ProtoTree, showing they fail to correctly identify image regions of interest and do not reflect model behavior, potentially creating false bias; it demonstrated quantitatively that using other saliency methods can mitigate this issue by providing more faithful patches.
In this work, we perform an analysis of the visualisation methods implemented in ProtoPNet and ProtoTree, two self-explaining visual classifiers based on prototypes. We show that such methods do not correctly identify the regions of interest inside of the images, and therefore do not reflect the model behaviour, which can create a false sense of bias in the model. We also demonstrate quantitatively that this issue can be mitigated by using other saliency methods that provide more faithful image patches.