Deep Verifier Networks: Verification of Deep Discriminative Models with Deep Generative Models
This addresses AI safety concerns, such as in autonomous driving, by providing a reliable verification method that works independently without retraining for new models.
The paper tackles the problem of verifying predictions from deep discriminative models by introducing deep verifier networks (DVN), which use deep generative models to detect out-of-distribution inputs, adversarial examples, and anomalies, achieving state-of-the-art results.
AI Safety is a major concern in many deep learning applications such as autonomous driving. Given a trained deep learning model, an important natural problem is how to reliably verify the model's prediction. In this paper, we propose a novel framework -- deep verifier networks (DVN) to verify the inputs and outputs of deep discriminative models with deep generative models. Our proposed model is based on conditional variational auto-encoders with disentanglement constraints. We give both intuitive and theoretical justifications of the model. Our verifier network is trained independently with the prediction model, which eliminates the need of retraining the verifier network for a new model. We test the verifier network on out-of-distribution detection and adversarial example detection problems, as well as anomaly detection problems in structured prediction tasks such as image caption generation. We achieve state-of-the-art results in all of these problems.