InfoDisent: Explainability of Image Classification Models by Information Disentanglement
This work addresses the need for better explainability in AI for users and developers, though it is incremental as it builds on existing methods like ProtoPNets.
The authors tackled the problem of explaining image classification models by introducing InfoDisent, a hybrid method that disentangles information into interpretable atomic concepts, achieving effectiveness as shown through computational experiments and user studies on datasets like ImageNet.
In this work, we introduce InfoDisent, a hybrid approach to explainability based on the information bottleneck principle. InfoDisent enables the disentanglement of information in the final layer of any pretrained model into atomic concepts, which can be interpreted as prototypical parts. This approach merges the flexibility of post-hoc methods with the concept-level modeling capabilities of self-explainable neural networks, such as ProtoPNets. We demonstrate the effectiveness of InfoDisent through computational experiments and user studies across various datasets using modern backbones such as ViTs and convolutional networks. Notably, InfoDisent generalizes the prototypical parts approach to novel domains (ImageNet).