Pre or Post-Softmax Scores in Gradient-based Attribution Methods, What is Best?
This work addresses a methodological choice for researchers and practitioners using interpretability tools in machine learning, but it is incremental as it compares existing approaches rather than introducing new ones.
The paper investigates whether pre-softmax or post-softmax gradients are more effective for gradient-based attribution methods in neural network classifiers, analyzing their practical differences and trade-offs without reporting specific numerical results.
Gradient based attribution methods for neural networks working as classifiers use gradients of network scores. Here we discuss the practical differences between using gradients of pre-softmax scores versus post-softmax scores, and their respective advantages and disadvantages.