iCaps: An Interpretable Classifier via Disentangled Capsule Networks
This work improves interpretability for image classification models, which is important for users needing transparent AI decisions, though it is incremental as it builds on existing Capsule Networks.
The paper tackled the problem of limited interpretability in Capsule Networks for image classification by addressing two key limitations: classification-irrelevant information and overlapping entities in class capsules. The result was iCaps, which provides clear rationales for predictions without performance degradation, as demonstrated on three datasets.
We propose an interpretable Capsule Network, iCaps, for image classification. A capsule is a group of neurons nested inside each layer, and the one in the last layer is called a class capsule, which is a vector whose norm indicates a predicted probability for the class. Using the class capsule, existing Capsule Networks already provide some level of interpretability. However, there are two limitations which degrade its interpretability: 1) the class capsule also includes classification-irrelevant information, and 2) entities represented by the class capsule overlap. In this work, we address these two limitations using a novel class-supervised disentanglement algorithm and an additional regularizer, respectively. Through quantitative and qualitative evaluations on three datasets, we demonstrate that the resulting classifier, iCaps, provides a prediction along with clear rationales behind it with no performance degradation.