LGMay 14

From Weight Perturbation to Feature Attribution for Explaining Fully Connected Neural Networks

arXiv:2605.153283.8
AI Analysis

This work offers a novel perspective on feature attribution for explainable AI, but the results are limited to simple fully connected networks and incremental over existing methods.

The paper proposes a new feature attribution method for fully connected neural networks that perturbs weights instead of input values, addressing limitations of occlusion techniques like Added Bias and Out-of-Distribution data. The methods XWP and XWP_c achieve competitive performance on standard metrics for simple DNNs.

Fully Connected Neural Networks (FCNNs) are often regarded as simple and intuitive architectures, yet they serve as the foundation for more complex models. Nonetheless, the lack of consensus on their interpretability continues to pose challenges, underscoring the enduring relevance of simpler, attribution-based approaches for understanding even the most advanced neural architectures. In this regard, we explore a novel idea for estimating feature attribution, by applying perturbation to the features' attached weights instead of their values. This method offers a fresh perspective aimed at mitigating common limitations in Occlusion techniques, such as Added Bias and Out-of-Distribution data. The application of this rule leads to the formation of a pair of novel attribution methods we call XWP and XWP_c. Founded on simple rules, our methods achieve competitive performance in identifying image signals for simple DNNs, competing with the most established attribution methods on standard baseline metrics. Our work thus contributes to the field of Explainability by introducing a robust framework that paves the way for addressing these long-standing vulnerabilities, and leads to more reliable and interpretable model explanations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes