LG LOFeb 5, 2025

Explain Yourself, Briefly! Self-Explaining Neural Networks with Concise Sufficient Reasons

arXiv:2502.03391v322.616 citationsh-index: 8Has CodeICLR

Originality Incremental advance

AI Analysis

This work addresses the need for interpretable AI by providing a more efficient and reliable method for generating explanations, which is incremental as it builds on existing concepts of minimal sufficient reasons.

The paper tackled the problem of generating minimal sufficient reasons for neural network predictions by addressing computational inefficiency and reliance on out-of-distribution sampling in post-hoc methods, resulting in a self-supervised training approach that produces concise and faithful subsets more efficiently while maintaining comparable predictive performance.

*Minimal sufficient reasons* represent a prevalent form of explanation - the smallest subset of input features which, when held constant at their corresponding values, ensure that the prediction remains unchanged. Previous *post-hoc* methods attempt to obtain such explanations but face two main limitations: (1) Obtaining these subsets poses a computational challenge, leading most scalable methods to converge towards suboptimal, less meaningful subsets; (2) These methods heavily rely on sampling out-of-distribution input assignments, potentially resulting in counterintuitive behaviors. To tackle these limitations, we propose in this work a self-supervised training approach, which we term *sufficient subset training* (SST). Using SST, we train models to generate concise sufficient reasons for their predictions as an integral part of their output. Our results indicate that our framework produces succinct and faithful subsets substantially more efficiently than competing post-hoc methods, while maintaining comparable predictive performance.

View on arXiv PDF Code

Similar