Rule Extraction from Binary Neural Networks with Convolutional Rules for Model Validation
This work addresses the interpretability problem for deep neural networks, particularly for high-dimensional image data, by providing a method to extract human-understandable logical rules.
This paper tackles the problem of interpreting black-box deep neural networks by extracting logical rules. It introduces first-order convolutional rules, which are extracted from binary neural networks using stochastic local search, to model network functionality and produce interpretable logical rules.
Most deep neural networks are considered to be black boxes, meaning their output is hard to interpret. In contrast, logical expressions are considered to be more comprehensible since they use symbols that are semantically close to natural language instead of distributed representations. However, for high-dimensional input data such as images, the individual symbols, i.e. pixels, are not easily interpretable. We introduce the concept of first-order convolutional rules, which are logical rules that can be extracted using a convolutional neural network (CNN), and whose complexity depends on the size of the convolutional filter and not on the dimensionality of the input. Our approach is based on rule extraction from binary neural networks with stochastic local search. We show how to extract rules that are not necessarily short, but characteristic of the input, and easy to visualize. Our experiments show that the proposed approach is able to model the functionality of the neural network while at the same time producing interpretable logical rules.