LGAIMLFeb 21, 2018

Interpreting Neural Network Judgments via Minimal, Stable, and Symbolic Corrections

arXiv:1802.07384v265 citations
AI Analysis

This addresses the need for interpretability in neural networks for users in domains like finance and automated reasoning, though it appears incremental as it builds on existing correction techniques.

The paper tackles the problem of interpreting neural network decisions by developing an algorithm to generate minimal, stable, and symbolic corrections to inputs that change the network's output, providing useful feedback to users. The method is evaluated on three models, including mortgage prediction and theorem proving, but no concrete performance numbers are provided.

We present a new algorithm to generate minimal, stable, and symbolic corrections to an input that will cause a neural network with ReLU activations to change its output. We argue that such a correction is a useful way to provide feedback to a user when the network's output is different from a desired output. Our algorithm generates such a correction by solving a series of linear constraint satisfaction problems. The technique is evaluated on three neural network models: one predicting whether an applicant will pay a mortgage, one predicting whether a first-order theorem can be proved efficiently by a solver using certain heuristics, and the final one judging whether a drawing is an accurate rendition of a canonical drawing of a cat.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes