CVSep 25, 2023

Pixel-Grounded Prototypical Part Networks

Zachariah Carmichael, Suhas Lohit, Anoop Cherian, Michael Jones, Walter Scheirer

arXiv:2309.14531v113.620 citationsh-index: 41

Originality Incremental advance

AI Analysis

This work addresses interpretability issues in machine learning for researchers and practitioners, though it is incremental as it builds on existing ProtoPartNNs.

The paper tackles the problem that existing prototypical part neural networks (ProtoPartNNs) misleadingly localize to entire images rather than object parts, contrary to their interpretability claims. It introduces PIXPNET, which uses new architectural constraints and pixel space mapping to achieve quantifiably improved interpretability without sacrificing accuracy.

Prototypical part neural networks (ProtoPartNNs), namely PROTOPNET and its derivatives, are an intrinsically interpretable approach to machine learning. Their prototype learning scheme enables intuitive explanations of the form, this (prototype) looks like that (testing image patch). But, does this actually look like that? In this work, we delve into why object part localization and associated heat maps in past work are misleading. Rather than localizing to object parts, existing ProtoPartNNs localize to the entire image, contrary to generated explanatory visualizations. We argue that detraction from these underlying issues is due to the alluring nature of visualizations and an over-reliance on intuition. To alleviate these issues, we devise new receptive field-based architectural constraints for meaningful localization and a principled pixel space mapping for ProtoPartNNs. To improve interpretability, we propose additional architectural improvements, including a simplified classification head. We also make additional corrections to PROTOPNET and its derivatives, such as the use of a validation set, rather than a test set, to evaluate generalization during training. Our approach, PIXPNET (Pixel-grounded Prototypical part Network), is the only ProtoPartNN that truly learns and localizes to prototypical object parts. We demonstrate that PIXPNET achieves quantifiably improved interpretability without sacrificing accuracy.

View on arXiv PDF

Similar