Formal Local Implication Between Two Neural Networks
This provides a method for comparing neural networks, such as original and compact versions, with formal guarantees, which is incremental in verification techniques.
The paper tackles the problem of formally verifying that one neural network's correct decisions imply another's over an entire input region, proposing a sound formulation for this local implication and evaluating it on MNIST, CIFAR10, and medical datasets.
Given two neural network classifiers with the same input and output domains, our goal is to compare the two networks in relation to each other over an entire input region (e.g., within a vicinity of an input sample). To this end, we establish the foundation of formal local implication between two networks, i.e., N2 implies N1, in an entire input region D. That is, network N1 consistently makes a correct decision every time network N2 does, and it does so in an entire input region D. We further propose a sound formulation for establishing such formally-verified (provably correct) local implications. The proposed formulation is relevant in the context of several application domains, e.g., for comparing a trained network and its corresponding compact (e.g., pruned, quantized, distilled) networks. We evaluate our formulation based on the MNIST, CIFAR10, and two real-world medical datasets, to show its relevance.