LGMLDec 31, 2019

Automated Testing for Deep Learning Systems with Differential Behavior Criteria

arXiv:1912.13258v12 citations
Originality Incremental advance
AI Analysis

This addresses robustness testing for deep learning systems, offering incremental improvements in automated testing techniques.

The paper tackles automated testing for deep learning systems by using differential behavior criteria to generate corner-case inputs, which improved model accuracy by up to 259.2% on the GTRSB dataset when used for retraining, and explored GAN-based methods for generating photo-realistic test images.

In this work, we conducted a study on building an automated testing system for deep learning systems based on differential behavior criteria. The automated testing goals were achieved by jointly optimizing two objective functions: maximizing differential behaviors from models under testing and maximizing neuron coverage. By observing differential behaviors from three pre-trained models during each testing iteration, the input image that triggered erroneous feedback was registered as a corner-case. The generated corner-cases can be used to examine the robustness of DNNs and consequently improve model accuracy. A project called DeepXplore was also used as a baseline model. After we fully implemented and optimized the baseline system, we explored its application as an augmenting training dataset with newly generated corner cases. With the GTRSB dataset, by retraining the model based on automated generated corner cases, the accuracy of three generic models increased by 259.2%, 53.6%, and 58.3%, respectively. Further, to extend the capability of automated testing, we explored other approaches based on differential behavior criteria to generate photo-realistic images for deep learning systems. One approach was to apply various transformations to the seed images for the deep learning framework. The other approach was to utilize the Generative Adversarial Networks (GAN) technique, which was implemented on MNIST and Driving datasets. The style transferring capability has been observed very effective in adding additional visual effects, replacing image elements, and style-shifting (virtual image to real images). The GAN-based testing sample generation system was shown to be the next frontier for automated testing for deep learning systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes