Explaining Representation by Mutual Information
This provides a theoretically grounded framework for interpretability in machine learning, addressing the need for reliable explanations of representation content, though it is incremental in building on existing mutual information theory.
The paper tackles the problem of explaining neural network representations by proposing a mutual information-based method that decomposes representations into three exhaustive components, demonstrating its interpretive power through visualizations on tasks like image classification and few-shot learning.
As interpretability gains attention in machine learning, there is a growing need for reliable models that fully explain representation content. We propose a mutual information (MI)-based method that decomposes neural network representations into three exhaustive components: total mutual information, decision-related information, and redundant information. This theoretically complete framework captures the entire input-representation relationship, surpassing partial explanations like those from Grad-CAM. Using two lightweight modules integrated into architectures such as CNNs and Transformers,we estimate these components and demonstrate their interpretive power through visualizations on ResNet and prototype network applied to image classification and few-shot learning tasks. Our approach is distinguished by three key features: 1. Rooted in mutual information theory, it delivers a thorough and theoretically grounded interpretation, surpassing the scope of existing interpretability methods. 2. Unlike conventional methods that focus on explaining decisions, our approach centers on interpreting representations. 3. It seamlessly integrates into pre-existing network architectures, requiring only fine-tuning of the inserted modules.