Stability Guarantees for Feature Attributions with Multiplicative Smoothing
This work addresses the problem of unreliable explanations in machine learning for practitioners needing robust feature attributions, though it is incremental as it builds on existing smoothing techniques.
The paper tackled the lack of formal guarantees in feature attribution methods for machine learning models by analyzing stability as a property, proving that relaxed stability is guaranteed under Lipschitz conditions with respect to feature masking, and developing Multiplicative Smoothing (MuS) to achieve this, demonstrating that MuS provides non-trivial stability guarantees when integrated with classifiers and attribution methods like LIME and SHAP on vision and language models.
Explanation methods for machine learning models tend not to provide any formal guarantees and may not reflect the underlying decision-making process. In this work, we analyze stability as a property for reliable feature attribution methods. We prove that relaxed variants of stability are guaranteed if the model is sufficiently Lipschitz with respect to the masking of features. We develop a smoothing method called Multiplicative Smoothing (MuS) to achieve such a model. We show that MuS overcomes the theoretical limitations of standard smoothing techniques and can be integrated with any classifier and feature attribution method. We evaluate MuS on vision and language models with various feature attribution methods, such as LIME and SHAP, and demonstrate that MuS endows feature attributions with non-trivial stability guarantees.