Incorrect by Construction: Fine Tuning Neural Networks for Guaranteed Performance on Finite Sets of Examples
This work highlights a security vulnerability for users of shared machine learning models, where formal guarantees can be exploited to compromise reliability.
The authors tackled the problem of using formal methods to guarantee neural network performance on specific examples, but also demonstrated how this can be used to implant incorrect classifications, showing that a fine-tuned MNIST network misclassified a particular image.
There is great interest in using formal methods to guarantee the reliability of deep neural networks. However, these techniques may also be used to implant carefully selected input-output pairs. We present initial results on a novel technique for using SMT solvers to fine tune the weights of a ReLU neural network to guarantee outcomes on a finite set of particular examples. This procedure can be used to ensure performance on key examples, but it could also be used to insert difficult-to-find incorrect examples that trigger unexpected performance. We demonstrate this approach by fine tuning an MNIST network to incorrectly classify a particular image and discuss the potential for the approach to compromise reliability of freely-shared machine learning models.