Constructing sensible baselines for Integrated Gradients
This work addresses interpretability challenges in machine learning, particularly for scientific applications like particle physics, but is incremental as it focuses on baseline improvements within an existing method.
The paper tackled the problem of designing effective baselines for Integrated Gradients to improve feature attributions in black-box models, finding that an averaged baseline from background events yields more reasonable attributions than a zero-vector baseline.
Machine learning methods have seen a meteoric rise in their applications in the scientific community. However, little effort has been put into understanding these "black box" models. We show how one can apply integrated gradients (IGs) to understand these models by designing different baselines, by taking an example case study in particle physics. We find that the zero-vector baseline does not provide good feature attributions and that an averaged baseline sampled from the background events provides consistently more reasonable attributions.