Logic interpretations of ANN partition cells
This work addresses the problem of making ANNs more interpretable for humans, though it appears incremental as it builds on existing methods for decomposition and logic representation.
The authors tackled the challenge of interpreting artificial neural networks (ANNs) by constructing a bridge between simple ANNs and logic, enabling analysis and manipulation of ANN semantics using logical tools.
Consider a binary classification problem solved using a feed-forward artificial neural network (ANN). Let the ANN be composed of a ReLU layer and several linear layers (convolution, sum-pooling, or fully connected). We assume the network was trained with high accuracy. Despite numerous suggested approaches, interpreting an artificial neural network remains challenging for humans. For a new method of interpretation, we construct a bridge between a simple ANN and logic. As a result, we can analyze and manipulate the semantics of an ANN using the powerful tool set of logic. To achieve this, we decompose the input space of the ANN into several network partition cells. Each network partition cell represents a linear combination that maps input values to a classifying output value. For interpreting the linear map of a partition cell using logic expressions, we suggest minterm values as the input of a simple ANN. We derive logic expressions representing interaction patterns for separating objects classified as 1 from those classified as 0. To facilitate an interpretation of logic expressions, we present them as binary logic trees.