Noise Modulation: Let Your Model Interpret Itself
This addresses the need for interpretable AI models in applications requiring transparency, though it is incremental by building on adversarial training concepts.
The paper tackles the problem of improving interpretability in deep neural networks by proposing noise modulation as an efficient, model-agnostic method to train models with interpretable input-gradients, achieving effective increases in interpretability as shown in experiments.
Given the great success of Deep Neural Networks(DNNs) and the black-box nature of it,the interpretability of these models becomes an important issue.The majority of previous research works on the post-hoc interpretation of a trained model.But recently, adversarial training shows that it is possible for a model to have an interpretable input-gradient through training.However,adversarial training lacks efficiency for interpretability.To resolve this problem, we construct an approximation of the adversarial perturbations and discover a connection between adversarial training and amplitude modulation. Based on a digital analogy,we propose noise modulation as an efficient and model-agnostic alternative to train a model that interprets itself with input-gradients.Experiment results show that noise modulation can effectively increase the interpretability of input-gradients model-agnosticly.