Sparse Oblique Decision Trees: A Tool to Understand and Manipulate Neural Net Features
This work addresses the need for interpretability in deep learning for practitioners and researchers, offering a tool to globally analyze and manipulate neural net features, though it is incremental in building on existing tree optimization methods.
The paper tackles the problem of understanding which internal features of neural networks are responsible for specific class predictions by replacing parts of the network with sparse oblique decision trees, achieving high accuracy and interpretability on MNIST and ImageNet datasets. It also demonstrates the ability to manipulate these features for adversarial attacks globally across training and test sets.
The widespread deployment of deep nets in practical applications has lead to a growing desire to understand how and why such black-box methods perform prediction. Much work has focused on understanding what part of the input pattern (an image, say) is responsible for a particular class being predicted, and how the input may be manipulated to predict a different class. We focus instead on understanding which of the internal features computed by the neural net are responsible for a particular class. We achieve this by mimicking part of the neural net with an oblique decision tree having sparse weight vectors at the decision nodes. Using the recently proposed Tree Alternating Optimization (TAO) algorithm, we are able to learn trees that are both highly accurate and interpretable. Such trees can faithfully mimic the part of the neural net they replaced, and hence they can provide insights into the deep net black box. Further, we show we can easily manipulate the neural net features in order to make the net predict, or not predict, a given class, thus showing that it is possible to carry out adversarial attacks at the level of the features. These insights and manipulations apply globally to the entire training and test set, not just at a local (single-instance) level. We demonstrate this robustly in the MNIST and ImageNet datasets with LeNet5 and VGG networks.