Evaluating Post-hoc Interpretability with Intrinsic Interpretability
This work addresses the problem of unreliable interpretability methods for clinicians using deep learning in medical diagnostics, but it is incremental as it adapts existing methods and metrics to a specific domain.
The paper tackled the challenge of validating post-hoc interpretability methods in medical imaging by comparing them to an intrinsically interpretable model, ProtoPNet, using adapted saliency metrics on a breast cancer dataset. The result showed that SmoothGrad and Occlusion had statistically higher overlap with ProtoPNet, while Deconvolution and Lime had the least.
Despite Convolutional Neural Networks having reached human-level performance in some medical tasks, their clinical use has been hindered by their lack of interpretability. Two major interpretability strategies have been proposed to tackle this problem: post-hoc methods and intrinsic methods. Although there are several post-hoc methods to interpret DL models, there is significant variation between the explanations provided by each method, and it a difficult to validate them due to the lack of ground-truth. To address this challenge, we adapted the intrinsical interpretable ProtoPNet for the context of histopathology imaging and compared the attribution maps produced by it and the saliency maps made by post-hoc methods. To evaluate the similarity between saliency map methods and attribution maps we adapted 10 saliency metrics from the saliency model literature, and used the breast cancer metastases detection dataset PatchCamelyon with 327,680 patches of histopathological images of sentinel lymph node sections to validate the proposed approach. Overall, SmoothGrad and Occlusion were found to have a statistically bigger overlap with ProtoPNet while Deconvolution and Lime have been found to have the least.