Toward Faithfulness-guided Ensemble Interpretation of Neural Network
This work addresses the need for better interpretability in neural networks, offering incremental improvements in faithfulness metrics for researchers and practitioners in machine learning.
The paper tackles the problem of providing interpretable and faithful explanations for neural network inferences by introducing the Faithfulness-guided Ensemble Interpretation (FEI) framework, which enhances faithfulness through smooth approximation and diverse variations, resulting in superior visualization and quantitative scores compared to existing methods.
Interpretable and faithful explanations for specific neural inferences are crucial for understanding and evaluating model behavior. Our work introduces \textbf{F}aithfulness-guided \textbf{E}nsemble \textbf{I}nterpretation (\textbf{FEI}), an innovative framework that enhances the breadth and effectiveness of faithfulness, advancing interpretability by providing superior visualization. Through an analysis of existing evaluation benchmarks, \textbf{FEI} employs a smooth approximation to elevate quantitative faithfulness scores. Diverse variations of \textbf{FEI} target enhanced faithfulness in hidden layer encodings, expanding interpretability. Additionally, we propose a novel qualitative metric that assesses hidden layer faithfulness. In extensive experiments, \textbf{FEI} surpasses existing methods, demonstrating substantial advances in qualitative visualization and quantitative faithfulness scores. Our research establishes a comprehensive framework for elevating faithfulness in neural network explanations, emphasizing both breadth and precision