LGAICRCVFeb 12, 2022

Excitement Surfeited Turns to Errors: Deep Learning Testing Framework Based on Excitable Neurons

arXiv:2202.07464v2
AI Analysis

This addresses the security and robustness issues of DNNs for real-world deployment, representing a novel method for a known bottleneck in testing.

The paper tackles the problem of systematic testing for deep neural networks (DNNs) by introducing a novel white-box testing framework called DeepSensor, which uses excitable neurons based on Shapley value to generate testing examples that effectively trigger errors due to adversarial inputs, polluted data, and incomplete training, resulting in demonstrated superiority in experiments on image classification and speaker recognition models.

Despite impressive capabilities and outstanding performance, deep neural networks (DNNs) have captured increasing public concern about their security problems, due to their frequently occurred erroneous behaviors. Therefore, it is necessary to conduct a systematical testing for DNNs before they are deployed to real-world applications. Existing testing methods have provided fine-grained metrics based on neuron coverage and proposed various approaches to improve such metrics. However, it has been gradually realized that a higher neuron coverage does \textit{not} necessarily represent better capabilities in identifying defects that lead to errors. Besides, coverage-guided methods cannot hunt errors due to faulty training procedure. So the robustness improvement of DNNs via retraining by these testing examples are unsatisfactory. To address this challenge, we introduce the concept of excitable neurons based on Shapley value and design a novel white-box testing framework for DNNs, namely DeepSensor. It is motivated by our observation that neurons with larger responsibility towards model loss changes due to small perturbations are more likely related to incorrect corner cases due to potential defects. By maximizing the number of excitable neurons concerning various wrong behaviors of models, DeepSensor can generate testing examples that effectively trigger more errors due to adversarial inputs, polluted data and incomplete training. Extensive experiments implemented on both image classification models and speaker recognition models have demonstrated the superiority of DeepSensor.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes