The Logic Traps in Evaluating Post-hoc Interpretations
This work highlights foundational problems in interpretability evaluation, which is crucial for researchers and practitioners in AI and ML, but it is incremental as it critiques rather than proposes new methods.
The paper identifies critical logic traps in existing evaluation methods for post-hoc interpretations of machine learning models, arguing that researchers should acknowledge these issues rather than ignore them.
Post-hoc interpretation aims to explain a trained model and reveal how the model arrives at a decision. Though research on post-hoc interpretations has developed rapidly, one growing pain in this field is the difficulty in evaluating interpretations. There are some crucial logic traps behind existing evaluation methods, which are ignored by most works. In this opinion piece, we summarize four kinds evaluation methods and point out the corresponding logic traps behind them. We argue that we should be clear about these traps rather than ignore them and draw conclusions assertively.