The inverse problem for neural networks
This work addresses interpretability challenges for neural networks, but it is incremental as it builds on an old result for polyhedral sets.
The paper tackles the problem of computing the preimage of sets under neural networks with piecewise-affine activations, showing that it can be effectively computed as a union of polyhedral sets and demonstrating applications in network analysis and interpretability.
We study the problem of computing the preimage of a set under a neural network with piecewise-affine activation functions. We recall an old result that the preimage of a polyhedral set is again a union of polyhedral sets and can be effectively computed. We show several applications of computing the preimage for analysis and interpretability of neural networks.